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ABSTRACT 

Since its establishment with a 1986 grant from the 
Fund for the Improvement of Postsecondary Education, the Assessment 
Resource Center (ARC) at the University of Tennessee has worked with 
state coordinating boards, other institutions involved in assessment 
leadership, institutions seeking guidance, and participants in 
conferences and seminars given by the ARC. The Center accomplished 
its objectives of establishing working relationships with numerous 
institutions, preparing printed materials, sponsoring tvorkshops, and 
developing a consortium of experienced assessment practitioners. In 
addition. Center personnel founded a national publication, planned an 
international seminar, and supported a cross-national study of 
assessment ±n higher education. This final report outlines the 
project's background and purpose, summarizes project impact, and 
documents plans for continuation and dissemination. Appendices, which 
comprise the bulk of the report, include: (l) an overview of the ARC; 
(2) an annotated bibliography of 12 items on assessment and a list of 
23 representative assessment programs; (3) re^aarch on the College 
Outcome Measures Project, with an ll-item annotated bibliography; (4) 
title pages and tables of contents of ARC publications; (5) resource 
appendices from "Performance and Judgment: Essays on Principles and 
Practice in the Assessment of College student Learning" edited by 
Clifford Adelman, containing an annotated bibliography of 
approximately 75 items and reviews of 22 arsessment instruments. 
(JDD) 
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ABSTRACT 



Since its establishment with a FIPSE grant in 1986, the Assessment Resource 
Center (ARC) has worked with state coordinating boards, other institutions involved in 
assessment leadership, institutions seeking guidance from the center, and several hundred 
participants in conferences and seminars given by the Center. The Center accomplished 
all of its original objectives, establishing working relationships with numerous institutions, 
preparing printed materials, sponsoring workshops, and developing a consortium of 
experienced assessment practitioners. In addition, Center personnel founded a national 
publication, planned an international seminar, and supported a cross-national study of 
assessment in higher education. We have worked to meet the increasing demand for 
materials and assistance, and hope to continue to contribute to the assessment literature 
and to participate in ongoing discussions about assessment at national and international 
levels. 
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EXECUTIVE SUMMARY 

Project Overview 

The national Assessment Resource Center project began at the University of 
Tennessee, Knoxville in September 1986. Outcomes assessment had become a priority 
in higher education as a result of institutional responses to recommendations in several 
national reports and in actions taken by a number of states and six regional accrediting 
boards. The state of Tennessee had become a leader in requiring public institutions to 
report the results of outcomes assessment activities. The University of Tennessee, 
Knoxville (UTK) had achieved national recognition for its assessment program and was 
receiving requests for assistance from other institutions across the nation who were 
beginning to develop such programs of their own. Thus representatives of the University 
proposed that FIPSE provide support to establish a national Assessment Resource Center 
at UTK. 

Since its establishment with a FIPSE grant in 1986, the Assessment Resource 
Center (ARC) has worked with state coordinating boards, other institutions involved in 
assessment leadership, institutions seeking guidance from the center, and several hundred 
participants in conferences and seminars given by the Center. The Center accomplished 
all of its original objectives, establishing working relationships with numerous institutions, 
preparing printed materials, sponsoring workshops, and developing a consortium of 
experienced assessment practitioners. In addition, Center personnel founded a national 
publication, planned an international seminar, and supported a cross-national study of 
assessment in higher education. 

Background and Purpose 

In 1984 educational leaders from UTK and several other institutions with successful 
assessment programs began meeting in small groups sponsored by the Association of 
American Colleges (AAC) and the American Association of Higher Education (AAHE) to 
discuss the need for national leadership for assessment. As a research university with 
a nationally and internationally recognized comprehensive outcomes assessment program, 
UTK had the resources necessary to establish and maintain a national center to provide 
information and coordinate developmental efforts on the topic of assessment. Since its 
establishment at UTK, the ARC has provided services to hundreds of participants through 
campus consultations, conference papers, publications, a national newsletter, on-site 
conferences, and an international conference. 
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During the period of its grant, the ARC has operated with a part-time director, a 
full-time associate director, a graduate assistant, and a secretary, at a state institution 
with an enrollment of 25,000 undergraduate and graduate students. Interest in 
assessment has increased over the life of the project, and the need for the services 
originally proposed has grown annually both in «• erms of the numbers of new learners to 
orient and new topics to discuss. We have learned that there is a greater need for what 
we proposed than we could ever have imagined, and our redefinition of the project has 
been to extend our services to a larger and more varied audience. 

Project Description 

The ARC proposal included four objectives: (1) to develop a bibliography of 
instruments and practices; (2) to gain recognition as a national resource center; (3) to 
provide information to others; and (4) to be active in discussing and offering solutions 
for assessment issues. For the ARC to accomplish these objectives we realized thai the 
bibliography should be iterative, the center would have to gain visibility prior to 
establishing leadership, information would best be provided in person to participants 
beginning to establish assessment programs, and small forums would best unite 
individuals focused on specific issues. 

Developing an assessment instrument bibliography became the first priority of the 
Center because that document was needed for distribution at the first workshop. 
Separate bibliographies on practices then were planned. A related project funded by the 
Office of Educational Research and Improvement enabled us to compile and publish one 
of the latter bibliographies. 

Frequent presentations at national and regional meetings became the principal 
means of gaining visibility for the ARC, and smaller conferences, consultations, 
workshops, and campus visits furthered achievement of this objective. Publishing 
important papers helped establish the ARC as a leader in assessment, and during the first 
year we began to make our printed materials available by mail upon request. 

Project Results 

Services in the form of materials and consultations have been provided to state 
agencies and colleges and universities in 49 states, the District of Columbia, Puerto Rico, 
and 5 foreign countries. Presentations have been made at the annual meetings of the 
AAC, AAHE, AEA, AERA, AIR, and FIPSE project directors. Three workshops were held 
in 1987 and 1988, with a total of more than 500 people attending. With help from the 
FIPSE project grant, the ARC published 27 papers, reports, and books; made 85 invited 
presentations off campus; developed a consortium of campus-based leaders; assumed a 
leadership role in the assessment movement; and cooperated with the AAHE Assessment 
Forum, the leading higher education organizations, and several state higher education 
agencies. In 1988 Jossey-Bass Publishers, Inc. invited the project director to edit the first 
national quarterly on assessment in higher education, Assessment Update, which reaches 
1800 subscribers. 
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Over the three years of this project, a change occurred in the type of information 
requested, moving from the most basic how-to-get-started inquiries to questions about 
more substantive methodological issues, such as the best way to measure achievement. 
Also, there has been growing interest in such non-traditional measures as portfolios and 
self-assessment. Presentations have changed from descriptions of UTICs assessment 
program to discussions of a variety of methodologies for assessing student outcomes. 
Contacts are handled differently now, by mail and telephone, with printed materials being 
sent out, rather than by receiving visitors at UTK. Now faculty as well as administrators 
in institutions of higher education, including UTK, seek to improve learning for their 
students as they respond to the various external pressures to become involved in 
assessment. 

The ARC gauged the impact of the project by the amount of information 
distributed and the quality of that information as rated by users via written evaluations 
of printed materials and workshops. Respondents to mailed surveys provided a mean 
overall rating of ARC materials of 4.1 on a 5-point scale ranging from poor (1) to 
excellent (5). Three workshops were consistently given ratings in excess of 7 points on 
a 10-point scale. Presentations, articles, and campus visits increased, and 8 major 
reference works, 19 research reports, and an international conference and study-group 
report were produced. The four objectives were accomplished, and the national interest 
in assessment continues undiminished. 

Continuation and Dissemination 

Plans for continuation include maintaining our leadership position with AAC, AEA, 
AAHE, AERA, and AIR presentations this year, as well as papers, symposia, and 
workshops. Evaluation activities to be continued include monitoring the quality of our 
materials via user surveys. We also plan to assess the quality of future workshops using 
printed evaluation forms similar to those developed in the course of the ARC project. 
In a major step following completion of the project, the University of Tennessee 
established the Center for Assessment Research and Development (CARD) to continue 
the work of the ARC. New projects fcr CARD include development of an employer 
survey and a second international conference, this one at St. Andrews University in 
Scotland in July 1990. 

Conclusions 

^ Insights gained as a result of grant activity stem from our awareness of the public's 
growing interest in accountability and favorable reactions to workshops and to assessment 
materials that we have made available. We have worked to meet f.he increasing demand 
for materials and assistance, and hope to continue to contribute to the assessment 
literature and to participate in ongoing discussions about assessment at national and 
international levels. 
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OVERVIEW, BACKGROUND, AND PURPOSE 
In 1986, when this project was proposed, there were clear indications that 
outcomes assessment in higher education was becoming an issue of high priority for 
college and university administrators and faculty. The following items illustrate this 
point: 



Several national reports, notably the National Institute of Education's (NIE) 
Involveme nt in Learning and Integrity in the College Curriculum from the 
Association of American Colleges (AAC), had called for institutions to assess 
the outcomes of student learning. 

A national conference planned to investigate response to the NIE report one 
year after its release in 1984 became the first National Conference on 
Assessment in Higher Education and attracted over 700 people - nearly 
double the early estimates of attendance. 

Two task forces of the National Governors' Conference had recommended 
that colleges and universities begin to furnish concrete evidence of their 
accountability for student intellectual development. 

The six regional accrediting associations had revised their standards to 
require evidence of student accomplishment of the educational objectives set 
by each institution. 

In Tennessee and South Dakota state coordinating boards had required 
public institutions to administer certain standardized tests and report 
students' scores to the respective boards. In other states, such as New 
Jersey and Virginia, coordinating agencies had supported pilot projects in 
selected institutions to encourage public colleges and universities to plan 
individualized outcomes assessment programs. 

In 1986 only three institutions - Alverno College, Northeast Missouri State 
University and the University of Tennessee, Knoxville (UTK) had acquired national 
recognition for their comprehensive outcomes assessment programs. Small groups of 
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educational leaders convened in 1985 by the AAC and the American Association for 
Higher Education (AAHE) had expressed concern that the growing number of institutions 
interested in outcomes assessment had no central resource to consult for assistance. 
Faculty and administrators at Alverno, Northeast Missouri and UTK were becoming 
overwhelmed by calls for help from other institutions. 

UTK proposed that FIPSE support an Assessment Resource Center (ARC) at a 
research university for the purpose of addressing the burgeoning need for information, 
assistance, and leadership in the area of outcomes assessment in higher education. To 
quote the original project proposal, "As the research university with a nationally and 
internationally recognized comprehensive outcomes assessment program, The University 
of Tennessee, Knoxville is uniquely qualified to . . . develop and disseminate information, 
address concerns common to groups of institutions, and improve practice in the 
assessment of student outcomes in higher education." Specific objectives of the ARC 
were to: 

(1) Develop at least one annotated bibliography of assessment instruments and 
practices for distribution to other institutions. 

(2) Gain recognition throughout the academic community for the ARC. 

(3) Provide information about assessment n Aerials and procedures to those 
seeking such information. 

(4) Contribute to the discussion and solution of assessment issues and problems 
by bringing together appropriate individuals for conversation and action. 
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With excellent support from the faculty and administrative staff of the University 
of Tennessee and of the FIPSE staff, all of the original objectives of the Assessment 
Resource Center have been accomplished. The staff, consisting of a half-time director, 
a full-time associate director, a graduate assistant, and a secretary, has developed and 
disseminated a continuously updated annotated bibliography of assessment instruments. 
The ARC. now the Center for Assessment Research and Development, is sufficiently well- 
known to have provided information and/or assistance to colleges and universities in 
49 states, as well as institutions and coordinating agencies in a half-dozen foreign 
countries. 

Excellent working relationships have been established with virtually all of the other 
institutions that have received early recognition as assessment leaders: Alverno, 
Northeast Missouri, Miami-Dade Community College, James Madison University, Kean 
College of New Jersey, King's College of Pennsylvania. Substantial amounts of staff time 
have been spent working with the state coordinating agencies of Tennessee, New Jersey 
and Virginia and with individual institutions in these states, as well as those in Missouri, 
California, Georgia, Pennsylvania, and South Carolina. In total, campus visits have been 
arranged to provide assistance to 52 institutions in 24 states and Puerto Rico. 

Four workshops, attracting between 95 and 204 participants each, were conducted 
with FIPSE assistance between 1986 and 1989. In a related development, an 
international seminar at Cambridge University for 45 individuals from 9 countries was 
supported by the University of Tennessee in July 1989. Also, in 1989 Jossey-Bass 
Publisher of San Francisco selected the project director, Trudy Banta, to edit Assessment 



Update, the first newsletter project ever undertaken by J-B and the first periodical 
devoted exclusively to assessment in higher education. 

The project contributed to the solution of problems in the field through a series 
of original contributions to rhe research literaturs and the creation of a consortium of the 
field's most experienced practitioners that met periodically for discussion and action. 

Following through on its initial commitment to support the ARC when FIPSE funds 
were withdrawn, on July 1, 198? the University of Tennessee established the Center for 
Assessment Research and Development (CARD). This center continues to provide written 
materials and personal assistance, as well as annual workshops, for other institutions 
seeking advice about assessment. A second international seminar is scheduled at St. 
Andrews University in Scotland July 24-27, 1990. A new FIPSE grant will permit UTK 
to coordinate a consortium of Tennessee institutions planning to develop an employer 
survey to enhance their assessment efforts. 

PROJECT DESCRIPTION 
The goal of the Assessment Resource Center was to "enhance the quality of the 
educational experience for students by increasing the use of outcomes assessment 
procedures in evaluating the progress of individual students as well as the quality of 
programs and services within higher education". In order to attain this goal, four specific 
objectives were established to guide program implementation and serve as a basis for 
evaluation. A brief summary of the efforts made by Center staff to meet these objectives 
is presented in Appendix A. 
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Objective 1: Develop at least one annotated bibliography of assessment 
instruments and practices . In attempting to meet the first objective during 1986, it 
became apparent to the Center's staff that more than one bibliography would be required. 
Furthermore, it was decided that the development of these bibliographies would be an 
iterative process. Given the interest in standardized outcomes measures, and facing a 
March 1987 deadline for the Center's first assessment workshop, the project director and 
associate director decided that a Bibliography of Assessment Instruments should be the 
first priority. Using information from the ETS Test Collection Annotated Bibliogr aphy of 
Tests , this first bibliography outlined the basic characteristics of each test and provided 
addresses for further information. Participation in the Office of Educational Research and 
Improvement's assessment audit project during the Center's second year helped to expand 
the Bibliography of Assessment Instruments, particularly in the areas of the measurement 
of critical thinking and motivation. The OERI assessment audit also provided the impetus 
for a second bibliography concerning articles on assessment practice. Edited versions of 
both the Bibliography of Assessment Instruments and the articles on assessment practice 
are contained in the Performance and Judgment volume published by OERI. 

The project director's editorship of a volume in the New Directions for Institutional 
Research series during the Center's second year provided a vehicle for developing a list 
of key resource people and institutions which was published in the New Directions 
volume. A copy of this chapter is presented in Appendix B. The materials developed 
in conjunction with the OERI assessment audit and the New Directions series were used 
to revise the Bibliography of Assessment Instruments and were presented to workshop 
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participant luring the second year of the project. To date, approximately 600 copies of 
the Bibliography of Assessment Instruments have been distributed. 

During the third year of the project, editorial changes v/ere made in the 
Bibliograp hy of Assessment Instruments and an enhanced review of the literature on 
assessment practice was made available. The review of assessment literature, entitled 
Assessment Bibliography, includes citations concerning assessment practice and has been 
bolstered by a detailed bibliography of assessment-related research conducted at UTK. 
During the final year of the project, the Center for Assessment Research and Development 
(the new parent organization for the Assessment Resource Center) assumed the role of 
editing Assessment Update, the first newsletter devoted to outcomes assessment. The 
Director of the Center serves as executive editor of the newsletter and the Associate 
Director provides a quarterly column on assessment measures. This responsibility has 
provided the Center with additional opportunities to identify resource people and 
disseminate information about assessment. 

For the future, work continues on a revision of the Bibliography of Assessment 
Instruments, The new bibliography will contain more information on institutional 
experiences using various assessment instruments. In addition, the Associate Director's 
column on assessment measures will provide another means of disseminating information 
about institutional experiences with assessment instruments. 

The current interest in measures of general education outcomes, coupled with 
UTICs extensive experience with the ACT-COMP exam, has motivated the Center's staff 
to develop a summary of research on the COMP exam. In addition, the staff is preparing 
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an updated bibliography of assessment resource people and institutions. Excerpts from 
the summary of research on the ACT-COMP exam are included in Appendix C. 

Objective 2: Gain recognition throughout the academic community . In order to 
attain the second program objective, gaining recognition for the Assessment Resource 
Center, staff members relied on three methods of enhancing the Center's visibility: 
(1) making invited presentations at national and regional meetings; (2) presenting 
refereed papers at national and regional meetings; and (3) publishing articles on 
assessment. During the first year of the project much of the effort to gain recognition 
for the Center was directed toward making invited presentations. Forums for these 
presentations included the annual meetings of the American Association for Higher 
Eduction (AAHE), the American Educational Research Association (AERA), the Association 
for the Study of Higher Education (ASHE), the Association of American Colleges (AAC), 
the AAHE Assessment Forum, and ten other national organizations. Not only did these 
meetings offer an opportunity to announce the creation of the Assessment Resource 
Center, they also provided lists of participants that later were used in producing a mailing 
list for the brochure describing the ARC. During the first year, the project director 
continued to publish articles on assessment and involved the associate director in two of 
these articles. 

During the second year of the project, the Center's staff continued to make invited 
presentations and present competitively selected papers at annual meetings of the AAHE, 
AERA, AAC, the American Evaluation Association (AEA), the AAHE Assessment Forum, 
the FIPSE project directors, and approximately twenty other organizations. Participation 



in the 0ER1 assessment audit and the editorship of the New Directions volume provided 
additional opportunities to describe the activities of the Center in published materials. 
The description of the role of the Center in supporting the assessment audit provided by 
Clifford Adelman in the introduction to Performance and Judgment and references to the 
Center's operations in the New Directions volume have helped to establish the ARC as 
a leader in the assessment movement. 

As in the second year of the Centers operation, the third year of the project 
included presentations at the annual meetings of the AAHE, AERA, AEA, the Association 
for Institutional Research (AIR), the FIPSE project directors, and at least twenty other 
organizations. In addition, the Center joined the electronic assessment bulletin board 
(Assessnet) and was listed by ERIC as a clearinghouse on assessment. Competitively 
selected papus and published articles also played an important role in enhancing the 
credibility of the Center. 

For the future, the Center's staff is focusing on presenting competitively selected 
papers at national meetings (AAHE, AEA, AERA, AIR) as a means of maintaining the 
current leadership position of the ARC (now Center for Assessment Research and 
Development). For example, staff members have proposed 11 papers, symposia, and/or 
workshops for these four national meetings in 1990-91. 

Objective 3: Provide information about assessment materials and procedures . 
Efforts to enhance the visibility of the Center also serve an educative function and help 
to meet the third program objective, the dissemination of information. Between 1986 
and 1989, presentations by the Center's staff have changed from descriptions of the 



assessment program at UTK to discussions of various paradigms for assessing student 
outcomes. 

In addition to making presentations, the ARC has responded to the need for 
information in two other ways: (1) establishing contact with interested parties by mail, 
telephone, and personal visits; and (2) conducting national workshops. During the three 
years of the project, the number of responses to requests for information and the number 
of workshop attendees has increased steadily. Over these three years two significant 
changes have occurred in the handling of direct contacts. First, from years one to three 
there has been a decrease in the number of contacts with visitors to the UTK campus and 
a much greater reliance on contacts by mail and telephone. The second change is that 
requests for reports and other assessment materials have increased. At the end of the 
second year of the project it became apparent that attempting to provide all of the 
materials requested would exhaust the Center's photocopying budget. As a result, 
individuals and institutions requesting information now are charged a fee sufficient to 
cover the costs of photocopying, mailing, and handling. 

Attendance at the ARC workshops has grown steadily during me three years of the 
project. Approximately 120 people attended the first Strategies for Assessing Outcomes 
workshop, and by the final year of the project more than 200 people attended the 
workshop. These workshops provide an introduction to assessment for institutions that 
are considering establishing assessment programs or are in the initial stages of 
implementing outcomes assessment. The content of the workshops has emphasized 
applied topics and attempted to provide ample opportunities for interaction between 
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workshop presenters and participants. Schedules for the three workshops are included 
in Appendix D. 

Objective 4: Contribute to the discussion and solution of assessment issues and 
problems. In order to meet the fourth program objective and advance the practice of 
assessment, the ARC has organized specialized seminars/workshops on specific aspects of 
assessment each year. During the first year the Center organized what was expected to 
be a small workshop for institutional research staff on the institutional effectiveness 
criterion proposed by the Southern Association of Colleges and Schools (SACS). The 
workshop was intended to provide a showcase for solutions to assessment problems 
developed by the Offices of Institutional Research and Information Systems at UTK. This 
meeting, planned for 35-40, unexpectedly grew to a size of nearly 100 participants. 

Based on the experience with an open-enrollment seminar during the first year of 
the project, subsequent seminars were limited to a small number of invited participants. 
The topic for the seminar conducted during the second year was institutional experience 
with the ACT-COMP exam. Representatives from eight institutions participated in a 
round-table discussion with members of the ACT-COMP staff. This discussion was helpful 
in communicating campus concerns about the COMP exam and served to emphasize the 
need for ACT staff to provide more technical information about the test. 

During the third year of the project two seminars were held. The first dealt with 
procedures for developing institution-specific outcomes measures. Held in Princeton, New 
Jersey, this conference included representatives from seven institutions and drew on the 
expertise of the Educational Testing Service (ETS). One of the outgrowths of this 
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expertise of the Educational Testing Service (ETS). One of the outgrowths of this 
meeting was a series of suggestions for developing tests in students' academic majors that 
subsequently was published by the ARC Associate Director in Assessment Update . 

The second conference held during the final year of the project took place at 
Alverno College in Milwaukee in February 1989. At this conference Stephen Dunbar 
from the Lindquist Center for Educational Measurement at the University of Iowa 
reviewed experiences with assessment at the K-12 level. Dunbar's conclusion that a 
primary weakness of K-12 assessment is that data are not used for program improvement 
helped provide the impetus for UTICs new FIPSE project. This conference produced the 
lead article for the Fall issue of Assessment Update . Lists of participants at all of the 
ARC seminars are presented in Appendix E. 

During the third year of the project the scope of activities for ARC staff was 
broadened to include an international dimension. A British consulting firm, H+E 
Associates, invited the University of Tennesi.ee to co-sponsor an international conference 
on assessing quality in higher education. This event was held at Robinson College of 
Cambridge University in July 1989. In addition, FIPSE funds were used to support the 
work of John Harris, professor at David Lipscomb University in Nashvnle, TN, as the 
U.S. representative on the international Study Group on Evaluation of Higher Education 
sponsored by the Organization for Economic Cooperation and Development (OECD) in 
Paris. In April 1989 the project director contributed one of the background papers for 
the Study Group. 
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RESULTS 

The Assessment Resource Center has effectively accomplished each of its four 
objectives. 

| Objective 1 . In addition to the Bibliography of Assessment Instruments and the 

listing of articles on assessment practice (Assessment Bibliographvl . the Center has 
produced 8 additional major reference works and 19 research reports. Tables of contents 
£ for the major reference works and a listing of the research reports are included in 

Appendix F. Edited versions of the bibliographies have been published in Performance 
and Judgment and the New Directions in Institutional Research volume, Implementing 
Outcomes Assessment: Promise and Perils . Copies of the chapters from Performance and 
£ Judgment are included in Appendix G. 

Audience response to these materials has been very positive. Through September 
15, 1989, individuals representing 158 institutions had requested copies of the latest 
£ versions of these materials. A detailed description of the characteristics of these requests 

is presented in Tables 1-3 of Appendix H. A survey of users was conducted in Spring 
1989 for the purpose of assessing perceived quality of the Center's publications. The 
scale used for the rating was a 5-point Likert Scale ranging from 1 = poor quality to 5 
J = excellent quality. The average rating for all publications was 4.10, and the range of 

ratings was from 3.48 for Material on Locally Developed Tests to 4.49 for the Assessment 
Bibliography. Results of the evaluations of ARC publications are provided in Table 4 of 
P Appendix H. 

| Objective 2. Since 1986, the Center's staff have made 85 invited and/or 
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competitively-selected presentations on assessment. The number of articles published on 
assessment has increased from 5 in 1987 to 7 in 1988 to 12 in 1989. Center staff have 
visited campuses in 24 states and Puerto Rico to make presentations on assessment and 
to assist faculty in developing their own assessment plans. 

Objective 3. To date the ARC has responded to requests for information and 
distributed information to colleges and universities in 49 states, the District of Columbia, 
Puerto Rico, and five foreign countries. A recent review of the quality of the Center's 
efforts reveals that ARC users give a mean quality rating of 4.03 (on a 5-point scale) to 
these efforts. Results of the evaluations of the ARC's communication efforts overall are 
presented in Table 5 of Appendix H. 

As previously noted, attendance at the Center's workshop has increased from 120 
participants representing 20 states at the March 1987 workshop to 126 participants from 
31 states at the November 1987 workshop to 204 participants from 26 states at the 
November 1988 workshop. Analyses of workshop evaluations indicate that participants 
have consistently rated the overall quality of the workshops between 7 and 8 on a 10- 
point scale. Table 6 in Appendix H presents the participants' ratings from each of the 
ARC workshops. 

Objective 4. Evaluations of the quality of the seminars designed to improve 
assessment practice are more subjective. However, participants in these seminars have 
consistently indicated that the material presented was very useful. This can be seen in 
the evaluations of the workshop held in August 1987, where participants rated the overall 
quality of the workshop 7.3 on a 10-point scale. Ratings for this workshop are included 
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in Table 7 of Appendix H. The letters of support from participants attending the three 
small workshops (see Appendix I) also provid2 evidence of the quality of these 
conferences. 

The International Conference on Assessing Quality in Higher Education at 
Cambridge University drew 45 participants representing 9 countries. The event was 
judged highly successful by all who w^re involved, and H+E Associates invited the 
University to co-sponsor a second such conference in 1990. 

John Harris completed his work with the OECD Study Group on Evaluation of 
Higher Education and submitted his final report to I\PSE in November 1989. An excerpt 
from that report appears in Appendix J. 

PLANS FOR CONTINUATION AND DISSEMINATION 
Most of the essential activities of the Assessment Resource Center are being 
continued under the auspices of the new UTK Center for Assessment Research and 
Development (CARD). The Bibliograp hy of Assessment Instruments, the Assessment 
Bibliography, and the collection of assessment-related research reports will be 
continuously updated and disseminated. The first "Strategies for Assessing Outcomes" 
workshop supported solely by CARD was held in Knoxville on November 6-7, 1989. The 
182 enthusiastic participants, among whom were panelists from Alverno College, 
Northeast Missouri State University, and Miami-Dade Community College, as well as the 
University of Tennessee, Knoxville, provided evidence of the continuing viability of a 
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medium-sized, highly interactive workshop that will address the questions and concerns 
of the newest practitioners of campus-based outcomes assessment. 

CARD staff have an abiding interest in continuous improvement of their activities, 
and will continue to pay close attention to careful evaluations of their materials and 
services. Continuing a practice begun at the first "Strategies" workshop in 1987, the 
November 1989 workshop in Knoxville was concluded with the administration to all 
participants of an evaluation form. 

The demand for the printed materials developed by the Center continues to 
increase as the number of assessment practitioners grows and as more individuals and 
institutions learn of the Center's services. 

A new FIPSE grant received by CARD in September 1989 will enable 
representatives of seven diverse institutions in Tennessee to jointly develop and 
administer a survey for employers of the graduates of these institutions. Simultaneously, 
staff at the seven institutions will begin to study the implications of W. Edwards Deming's 
quality improvement philosophy for higher education. 

The one area of the Center's services that, unfortunately, cannot continue without 
external funding is the exploration of assessment issues by the consortium of campus- 
based assessment leaders established under ARC auspices. At its final meeting during the 
Assessment Forum in Atlanta in June 1989, this group outlined a series of some half- 
dozen topics for discussion that it wished to pursue. However, since campus funding for 
academic travel is limited, the group concluded that it could not meet again for an 
extended special session with a consultant unless another source of funding can be found. 
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J The responsibility for editing Assessment Update should maintain the visibility of 

Center staff and assist us in continuing to provide leadership with respect to the future 
development of the field of assessment in higher education. Moreover, sponsorship of a 
£ series of international conferences the second to be held at St. Andrews University in 

Scotland in July 1990 - will extend staff association with assessment to an international 
level. 



i 
i 
i 
i 
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SUMMARY AND CONCLUSIONS 
When the concept of an Assessment Resource Center was proposed in 1986, there 
was no clear understanding of the extent to which assessment would continue to be an 
£ important national and international issue. We wondered if the introductory "Strategies" 

workshop would be needed after the first or second year of our work. 

Now it is obvious that the public interest in accountability in postsecondary 
fj education is sufficiently great that assessment will be an issue with high priority for years 

to come. At the November 1989 "Strategies for Assessing Outcomes" workshop in 
Knoxville, there was every indication that participants would welcome a similar offering 
in 1990. One of our most important sources of participants each year has been the 
£ colleagues of participants in prior years. 

The demand for materials and assistance in connection with the practice of 
campus-based outcomes assessment is another indication of the continuously increasing 
£ nature of interest in assessment. The very favorable reaction of colleagues around the 

| world to the ARC materials has given us the impetus to continue to develop the 
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collection. In addition, we hope that we will continue to be included in national and 
international discussions of the future of assessment and can make a significant 
contribution to the literature of assessment. The large number of students involved in 
assessment, the diversity of departments thst have undertaken their own assessment 
activities, and the involvement of faculty in a variety of disciplines in the pursuit of 
assessment-related research virtually assure the continued leadership role of the University 
of Tennessee, Knoxville in the advancement of knowledge about the practice of outcomes 
assessment. We hope to use that role to chart a wise course for assessment research and 
development in future years. 
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The Assessment Resource Center 
at the 

University of Tennessee, Knoxville 

Funded by the University and a grant from 
The Fund for the Improvement of 
Postsecondary Education 
U.S. Department of Education 



9/15/86 - 9/15/89 



Purpose: To provide leadership and assistance for colleges and universities throughout the country interested m 
assessing the outcomes of higher education. 

A Summary of Activities of the Center Staff & University Colleagues 

Preparation of collections of useful materials, bibliographies of assessment instruments, and summaries of successful 
institutional practices 

Publication of numerous articles, several books, and a national newsletter (in association with Jossey-Bass) 

* Distribution of information to 1400 colleges, universities, and organizations in 49 states and 6 other countries 

* Conduct of workshops and seminars 

-March 1987 workshop in Nashville - 120 participants from 20 states 
-August 1987 workshop in Knoxville - 95 participants from 12 states 
-November 1987 workshop in Memphis - 126 participants from 31 states 
-November 1988 workshop in Knoxville - 204 participants from 29 states 

* Presentations at international, national and regional meetings 
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■ -ACT Conference Series on Assessment 

-American Association for Higher Education 

B -American Association of State Colleges and Universities 
-American Educational Research Association 

-Association for Handicapped Student Services Programs in Postsecondary Education 

I -Association for Institutional Research 
-Association for the Study of Higher Education 
-Association of American Colleges 

-Bryn Mawr Summer Institute for Women in Higher Education Administration 
m -California State University System 

m -Eastern Sociological Association 

-John Dewey Society 

S -National Association of Student Personnel Administrators 
-National Association of Colleges and Teachers of Agriculture 
-National Conference on Assessment in Higher Education 

S -National Education Association 
-Organization for Economic Cooperation and Development - Paris 
-Society for College and University Planning 
-Southeastern Association for Institutional Research 
-Southern Association for Institutional Research 
-Southern Association of Colleges and Schools 
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Consultation with national and regional organizations 

-Administrators of Accounting Programs 

-American Association for Higher Education 

-American College Testing Program 

-ASHE/ERIC - Consulting Editor 

-Association for the Study of Higher Education 

•Association of American Colleges 

-Kansas City Regional Council for Higher Education 

-National Association of State Universities and Land-Grant Colleges 

-Pennsylvania Council of Academic Deans 

-Southern Association of Colleges and Schools 

-UCLA-FIPSE Value-Added Consortium 

-U.S. Department of Ed. - FIPSE and Office of Ed. Research & Improvement 

Consultation with colleges and universities in 21 states, including: 

-California State University, Los Angeles 
-University of Alabama, Tuscaloosa 
-University of Mississippi 
-University of Missouri, Columbia 
-University of North Carolina, Chapel Hill 
-University of Wisconsin, Madison 

Conduct of campus visits for representatives of institutions in: 

Australia Canada The Netherlands West Germany 14 States 

Conduct of special topics seminars for assessment leaders: 

•April 1988 in Kansas City - "Users' problems in using the ACT COMP exam* -for 
institutional representatives from 7 states and ACT COMP staff 

-October 1988 at ETS in Princeton, N.J. - "Problems in developing tests in the major" for 11 assessment leaders 

-February 1989 - "Implications of K-12 competency testing for assessment in 
higher education" for 12 assessment leaders 

Recognition Received by UTK Assessment Program 

0 National Council on Measurement in Education Triennial Award for Excellence in Using Educational 
Measurement Technology (1984) 

0 Selected to present the United States Case Study on the Assessment of Institutional Effectiveness at the 
annual meeting in Paris of tLe Organization for Economic Cooperation and Development (1986) 

0 Proposal for Assessment Resource Center was one of 75 selected for funding by FIPSE from 2100 
proposals submitted (1986) 

0 Selected as sole source contractor for the Assessment Audit funded by the U.S. Department of 
Education (1987) 

0 Director honored by the American Association for Higher Education for contributions to the field of 
assessment 

0 Invited by Jossey-Bass Publishers of San Francisco to edit the nation's first quarterly publication in the 
field of assessment (1988) 

0 Featured in CNN interview (1987) 

0 Participant in national teleconference on assessment (1988) 

0 Subject of features published in Le Monde, the New York Times, the Chronicle of Higher Education. 
Change, and the international Journal of Institutional Management in Higher Education 
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1984 

0 Article in Educational Measurement 



1985 

0 

0 

1986 
o 



1987 
0 



o 



o 



1989 

0 

o 

0 
0 
0 



Quick Summary of Published Work on Assessment by ARC Personnel 



Chapter in New Directions for Higher Education volume 
Chapter in New Directions for Institutional Research 



Book: Performance Funding in Higher Education: A Critical Analysis 
0 Article in Educational Record 

0 Article in Conference Proceedings of meeting co-sponsored by AASCU and George Mason University 
0 Article in Journal of Institutional Management in Higher Education 



Article in The Chronicle of Higher Education 
Chapter in New Directions for Higher Education 
0 Article in State Education Leader 
0 Article in Virginia Coi/imunity College Journal 
Article in Yearbook of American Universities 



1988 

0 Article in Journal of Higher Education 

0 Edited volume in New Directions for Institutional Research series 

Chapter in Performance and Judgment published by the U.S. Office of Ed. Research & Improvement 
Article in Research in Higher Education 

Article in Proceedings of the annual meeting of the John Dewey Society 
Chapter in Yearbook of American Universities and Colleges . New York: Garland 
Consultant for article in Changing Times 

Editor for quarterly newsletter on assessment for Jossey-Bass Publishers, San Francisco 



Several articles in Assessment Update: Progress, Trends, and Practices in Higher Education 
Chapter in The Theory and Practice of Outcomes Assessment . S, J. Reithlingshoefer (Ed.) 
Two articles in Research in Higher Education 
Article in Review of Higher Education Research 

Proceedings of the 19 89 International Conference on Assessing Quality in Higher Education 
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CAMPUS VISITS 
1985-1989 



Alabama 

Samford University - 1989 

California 

Cal State, Los Angeles - 1987 

Connecticut 

University of Connecticut - 1988 

Georgia 

Berry College - 1988 
Shorter College - 1989 

Hawaii 

University of Hawaii, Manoa - 1988 
Community College System - 1988 

Illinois 

Rosaiy College - 1988 
Triton College - 1985 

Indiana 

University of Southern Indiana - 1987 

Kentucky 

Berea College - 1987 
Murray State University - 1988 
University of Louisville - 1986 
Kentucky State University - 1988 

Louisiana 

Louisiana Tech - 1989 

Michigan 

Saginaw Valley State College - 1985 
Wayne State University - 1986 

Missouri 

Northeast Missouri State University - 1985 
Southwest Missouri State University - 1988 

Mississippi 

Mississippi University for Women - 1987 
Jackson State University - 1988 

New Jersey 

Kean College - 1987, 1988, 1989 

North Carolina 

University of North Carolina, Charlotte - 1987 

Fayetteville State University - 1987 

Forsyth Technical Community College - 1989 



Ohio 

University of Dayton - 1985 
University of Toledo - 1988 

Oklahoma 

Northeastern State University - 1989 

Pennsylvania 

Gannon University - 1985 

King's College - 1985 

Philadelphia College of Pharmacy & 

Science • 1989 
University of Scranton - 1989 

Puerto Rico 

Humacao University College of the University 
of Puerto Rico - 1989 

Rhode Island 

Rhode Island College - 1985, 1987, 1989 
Rhode Island Community College - 1989 

South Carolina 

South Carolina State College - 1987 
Winthrop College - 1988 
University of South Carolina - 1989 
Midlands Technical College - 1989 

South Dakota 

University of South Dakota, Vermillion - 1986 

Tennessee 

Memphis State University - 1988 
Roane State Community College - 1987 
Milligan College - 1989 
University of TN, Chattanooga - 1989 

Texas 

Southwest Texas State University - 1985 
University of Houston - 1987 

Virginia 

Clinch Valley College - 1987, 1989 

James Madison University - 1985, 1989 

Old Dominion University - 1986 

Marymount University - 1988 

Virginia Highlands Community College - 1988 

Mountain Empire Community College - 1988 

Wytheville Community College - 1987 

Washington 

Central Washington State University - 1987 
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Before beginning assessment programs, institutions should 
carefully examine the literature on assessment practice, 
including the experiences of other institutions. 



An Annotated Bibliography 
and Program Descriptions 

Gary R. Pike 



Over the last five years, interest in assessing student educational outcomes 
has increased dramatically in this country. This growing interest has 
been fueled in large part by state laws and accreditation requirements. 
There has also been a parallel growth in the literature on assessment In 
view of the variety of information currently available, this bibliography 
is not exhaustive. Instead, it provides a starting point for the study of 
assessment. 

Sources on Assessment 

Adelman, C P. (ed.). Assessment in American Higher Education: Issues and 
Contexts. Washington, D.C: U.S. Government Printing Office, 1985. 
This series of essays, based on presentations made at the First 
National Conference on Assessment in Higher Education, examines 
assessment from a variety of perspectives, including the philosophy of 
assessment, assessment in professional/technical schools, selection of 
assessment instruments, and evaluating the costs of assessment. 

Astin, A. W. Achieving Educational Excellence: A Critical Assessment of 
Priorities and Practices in Higher Education. San Francisco: Tossey-Bass, 
1985. 

T W. Banta (ed.). Implement OuScoma Auasmtnt: Ptvmite and Ptrih. 

New Direction* for liuiitutional Research, no. 59. San Franciico: Jouey-Bais. Fall I9S3. 99 



33 



100 



Astin examines the concept of quality in higher education. He 
begins by identifying some limitations of the traditional indicators of 
quality, and he advocates the use of a talent-development (value added) 
approach to assessing quality. Using data from the Cooperative Institu- 
tional Research Program (CIRP), Astin suggests several steps institutions 
can take to promote educational quality. 

Banta, T. W. (ed.). Performance Funding in Higher Education: A Critical 
Analysis of Tennessee's Experience. Boulder, Colo.: National Center for 
Higher Education Management Systems, 1936. 

Tennessee's performance-funding initiative serves as the focal 
point of the essays in this volume. The topics include policy issues (such 
as the development of performance-funding criteria and the use of data 
in institutional planning) and measurement issues (such as using surveys 
to measure student satisfaction, and testing achievement in general edu- 
cation and in the major). 

Bergquist, W. H., and Armstrong, J. R. Planning Effectively for Educa- 
tional Quality: An Outcomes-Based Approach for Colleges Committed to 
Excellence. San Francisco: Jossey-Bass, 1986. 

In providing a model for improving educational quality, the 
authors urge institutions to examine their missions, develop pilot pro- 
grams designed to assist in accomplishing those missions, assess the effec- 
tiveness of the pilot programs, and then implement large-scale programs. 
They stress the importance of incorporating outcomes data into the plan- 
ning process. 

Berk, R, A. (ed.). Performance Assessment: Methods and Applications. Bal- 
timore, Md.: Johns Hopkins University Press, 1986. 

This volume is a basic reference work on performance assessment. 
Authors of the essays present a variety of methods, ranging from behavior 
rating to assessment centers. The authors also show uses of performance 
assessment in business, medicine, law, teaching, and evaluation of com- 
munications skills. 

Boyer, C. M., Ewell, P. T, Finney, J. E„ and Mingle, J. JL "Assessment 
and Outcomes Measurement: A View from the States." AAHE Bulletin, 
1987, 39 (7), 8-12. 

These authors report the results of a survey conducted by the 
Education Commission of the States. The purpose of the survey was to 
identify trends in state-promoted assessment. Results showed that 
approximately two-thirds of the states have established or are establish- 
ing programs that provide incentives for assessment efforts. 
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Educational Testing Service. Assessing the Outcomes of Higher Education: 
Proceedings of the 1986 ETS Invitational Conference. Princeton, N.J.: 
Educational Testing Service, 1987. 

Topics addressed in the papers presented at this invitational con- 
ference on assessment include the responses of accrediting associations, 
state agencies, and colleges and universities to the assessment movement; 
the use of unobtrusive measures in assessing student outcomes; and the 
role of value-added analyses in assessing educational outcomes. 

El-Khawas, E. "Colleges Reclaim the Assessment Initiative. 1 ' Educational 
Record, 1987, 68 (2), 54-58. 

In this article, the author summarizes the results of a recent survey 
conducted by the American Council on Education (ACE). The auth -*r 
argues that colleges and universities are taking the lead in promoting tne 
assessment of student educational outcomes. According the ACE survey, 
colleges and universities are developing campus assessment programs 
designed to improve campus planning, rather than to satisfy external 
mandates. 

Ewell, P. T. "Assessment: Where Are We?" Change, 1987, 19 (1), 23-28. 

While Ewell briefly examines efforts of private institutions to 
assess student outcomes, the primary focus is on the response of public 
institutions to state mandates. Ewell describes several state-mandated 
approaches and institutional responses to them, as well as several con- 
cerns arising from the trend toward state-mandated assessment. 

Gronlund, N. E. Constructing Achievement Tests. (3d ed.) Englewood 
Cliffs, N.J.: Prentice-Hall, 1982. 

The author provides a basic introduction to constructing achieve- 
ment tests. Gronlund discusses all phases of test construction and eval- 
uation, including specification of educational objectives, development 
of test items, and evaluation of test items. Gronlund also identifies dif- 
ferences in construction and scoring between objective and essay 
examinations. 

Henerson, M. E., Morris, L. L., and Fitz-Gibbon, C. T. How to Measure 
Attitudes. Newbury, Park, Calif.: Sage, 1983. 

These authors describe a variety of approaches for measuring atti- 
tudes, including surveys, interviews, and observations of behavior. They 
also discuss development, reliability, and validity of instruments. 

Marchese, T. J. "Third Down, Ten Years to Go." AAHE Bulletin, 1987, 
40 (4), 3-8. 

Marchese traces the assessment movement over the last three years. 
The author describes six approaches to assessing educational outcomes: 
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the assessment center, assessment as learning, assessment as program mon- 
itoring, assessment of student development, assessment as standardized 
testing, and assessing through a senior examiner. 

Representative Assessment Programs 

The growing interest in educational outcomes has produced a 
variety of approaches to assessment. The following organizations engage 
in ongoing assessment. This list represents only a sampling of current 
programs, but it covers the variety of approaches being used at this time. 

Alverno College 

Judeen Schute, Alverno College, 3401 South 39th Street, Milwau- 
kee, WI 53215; (414)382-6000. Designs and implements general cluca- 
tional outcomes assessment. 

American Association for Higher Education 

Patricia Hutchings, American Association for Higher Education, 
One Dupont Circle NW, Suite 600, Washington, DC 20036; (202)293- 
6440. Convenes annual forum, supports descriptive studies of assessment, 
and provides referral service. 

Association of American Colleges 

Carol Schneider, Association of American Colleges, 1818 R Street 
NW, Washington, DC 20009; (202)387-3860. Coordinates assessment pro- 
grams that rely on visiting examiners. 

City University of New York, Research Foundation 

Harvey S. Wiener, Professor of English, CUNY/Research Founda- 
tion, 309 Clearview Lane, Massapequa, NY 11758; (516)799-1951. Assess- 
ment of word processing and writing effectiveness. 

Educational Testing Service 

Roy Hardy, Director, Educational Testing Service 250 Piedmont 
Avenue NE, Suite 1240, Atlanta, GA 30308; (404)524-450i. Develops item 
banks for assessment of learning outcomes in five disciplines. 

Harvard Medical School 

Gordon Moore, Director, New Pathway Project, Harvard Medical 
School, 25 Shattuck Street, Boston, MA 02115; (617)732-0634. Conducts 
comparative assessments of traditional and medical school curriculum 
models. 

Harvard University 

Richard Light, Professor, School of Education, Harvard Univer- 
sity, Cambridge, MA 02138; (617)495-1183. Manages cooperative pilot 
project involving assessment at selective institutions. 
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Indiana University of Pennsylvania 

Robert Millward, Director, Pre-Teacher Assessment Center. Indiana 
University of Pennsylvania, 136 Siouffer Hall, Indiana, PA 15705; 
(412)357-2480. Establishes pre-teacher assessment in new center that eval- 
uates teaching abilities using classroom simulations. 

James Madison University 

Dary Erwin, Office of Student Assessment, James Madison Univer- 
sity, Harrisonburg, VA 22801; (703)568*6211. Conducts assessment in seven 
broad areas: major, general education, interdisciplinary objectives, affec- 
tive development, functional skills, alumni, and environment. 

Kean College 

Michael Knight, Donald Lumsden, Assessment o£ Student Learn- 
ing and Development, Kean College of New Jersey, Union, NJ 07083; 
(201)527-2000. Uses faculty-developed outcomes assessment in each pro- 
^Tam area. 

King's Colleg: 

D. W. Farmer, Vice-President and Dean of Academic Affairs, 
King's College, Wilkes-BarTe, PA 18711; (717)826-5900. Uses outcomes- 
oriented curriculum, complemented by course-embedded assessment pro- 
gram; emphasis on "transferable skills of liberal learning" linked with 
progress in major. 

Miami University of Ohio 

Karl Schilling, Associate Dean, Western College Program, Miami 
University, Oxford, OH 45056; (513)529-1809. Employs comparative assess- 
ment of discipline-based and interdisciplinary undergraduate curricula at 
Miami University of Ohio. 

Northeast Missouri State University 

Charles J. McCIain, President; Darrell Krueger, Dean of Instruc- 
tion; Administration/Humanities Building, Northeast Missouri State 
University, Kirskville, MO 63501; (816)785-4100. Employs value-added 
approach using standardized tests and uses surveys to assess student 
growth and evaluate the university. 

Ohio Board of Regents 

Elaine H. Hairston, Vice-Chancellor, Academic and Special Pro- 
grams, Ohio Board of Regents, 30E Broad Street, 36th Floor, Columbus, 
OH 43266; (614)466-6000. Promotes excellence, stimulates assessment, 
and communicates program and institutional improvements to external 
agencies, on the basis of a statewide assessment project. 
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Rhode Island College 

William Enteman, Provost and Vice-President for Academic 
Affairs, Rhode Island College, Providence, RI 02908; (401)456-8003. Devel- 
ops assessment activities linked to curriculum revisions, new advising 
systems, and individualized educational plans. 

South Dakota State University 

Kris Smith, Assessment and Testing Office, South Dakota State 
University, Administration Building, Room 215, Brookings, SD 57007; 
(605)688-4217. Reviews general educational outcome assessment instru- 
ments after trial use; also uses a variety of subject area tests and surveys. 

Southern Association of Colleges and Schools 

Carol A. Luthman, Assistant Executive Director, Commission on 
Colleges, Southern Association of Colleges and Schools, 795 Peachtree 
Street NE, Atlanta, GA 30365; (404)847-6120. Develops manuals describ- 
ing college use of outcomes assessment during the accreditation process. 

State University of New York, Plattsburg 

Thomas Moran, Assistant Vice-President for Academic Affairs, 
SUNY/Plattsburg, Plattsburg, NY 12901; (518)564-2080. Develops new 
assessment procedures as alternatives to nationally standardized tests. 

Texas College and University System 

Mary Griffith, Project Director, Community College and Technical 
Institute Division, Coordinating Board of the Texas College and Univer- 
sity System, P.O. Box 12788, Houston, TX 78711; (512)475-0718. Defines 
college-level skills in reading, writing, and mathematics for subsequent 
adoption by Texas postsecondary system. 

University of Kentucky 

Charles Elton, Karen Carey, College of Education, University of 
Kentucky, 111 Dickey Hall, Lexington, KY 40506; (606)257-2627. Studies 
value-added approaches to assessing institutional effectiveness. 

University of Massachusetts 

]os6 P. Mestre, Department of Physics and Astronomy, University 
of Massachusetts, Amherst, MA 01003; (413)545-2040. Researches and 
develops computer-assisted problem-solving skills (bilingual: English and 
Spanish). 

University of New Mexico 

Scott Obenshain, School of Medicine, University of New Mexico, 
P.O. Box 508, Albuquerque, NM 87131; (505)277-4823. Currently devel- 
oping a self-assessment center for medical students. 
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University of Tennessee, Knoxville 

Trudy W. Banta, Gary R. Pike, The Assessment Resource Center. 
University of Tennessee, 2046 Terrace Avenue, Knoxville, TN 37996-3504; 
(615)974-0883. Conducts a comprehensive assessment program involving 
testing in the major and in general education, and surveying students 
and alumni. Results are used in program reviews and institutional plan- 
ning and improvement. Conducts workshops and disseminates informa- 
tion about assessment to other colleges and universities. 



Gary R. Pike is associate director of the Assessment Resource 
Center at the University of Tennessee, Knoxville. 
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INTRODUCTION 



In 1976, the American College Testing Program (ACT) organized the 
College Outcome Measures Project (COMP) to develop a measure of "knowledge 
and skills relevant to successful functioning in adult society" (Forrest, 
1982, p. 11). Available since 1979-80, the COMP exam has been administered 
at least once at more than 500 colleges, and it is used annually by approx- 
imately 100 four-year institutions in the evaluation of their general 
education programs (American College Testing Program, 1987). Until re- 
cently, the COMP exam was the only instrument designed for evaluating 
general education programs, and in 1989 it remains the only measure for 
which a substantial amount of data is available. 

The COMP exam is available in two forms: the Objective Test (con- 
sisting of multiple-choice items) and the Composite Examination (containing 
multiple-choice questions, along with exercises requiring students to write 
essays and record speeches). ACT reports that the correlation between the 
two forms of the exam is .80, allowing the Objective Test to serve as a 
proxy for the Composite Examination (Forrest & Steele, 1982). Most insti- 
tutions, including the University of Tennessee, Knoxville, use the Objec- 
tive Test for program evaluation because it is easier to administer and 
score (Banta, Lambert, Pike, Schmidhammer , & Schneider, 1987). 

The Objective Test takes approximately 2k hours to administer and 
contains 60 questions, each with two correct answers. The questions are 
divided among 15 separately timed activities drawing on material (stimuli) 
from television programs, radio broadcasts, and print media. Students 
taking the COMP Objective Test are instructed that there is a penalty for 
guessing (i.e., incorrect answers will be subtracted from their scores), 
but that leaving a question blank will not be counted against them. The 
combination of two correct answers for each question, the guessing penalty, 
and no penalty for not answering a question means that the score range for 
each of the 60 items Is from -2 to 2 points. A score of -2 represents two 
incorrect answers, while a score of -1 represents one incorrect answer and 
one left blank. A score of 0 can represent either both answers left blank 
or one correct and one incorrect answer. A score of 1 represents one 
correct answer and a blank, and a score of 2 represents two correct an- 
swers. For interpretability, these scores are rescaled (0 to 4), making 
the maximum possible score on the Objective test 240 points and a chance 
score 120 points. 

New forms of the COMP Objective Test are developed on an annual basis. 
In order to ensure comparability of scores across forms, the COMP staff 
equates each new form to the original test (Form III). This equating is 
done using samples of high school and college seniors who are double- tested 
using Form III and the new form of the test. Statistical procedures in- 
volve the use of Angoff's (1984) Design II (Steele, J. M. , personal commu- 
nication, 14 September 1989). 
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In addition to a total score, the COMP Objective Test provides three 
content subscores (Functioning within Social Institutions, Using Science 
and Technology, and Using the Arts) and three process subscales (Communi- 
cating, Solving Problems, and Clarifying Values). In the technical manual 
for the COMP exam, ACT staff report that the alpha reliability of the total 
score is .84, and that reliability estimates for the subscores range from 
.63 to .68 (Forrest & Steele, 1982). Estimates of parallel -forms reliabil- 
ity are .79 for the total score and range from .53 to .68 for the 
subscores. More recently, Steele (1988) has argued that the reliability 
coefficients for group means cn the COMP exam are .98 for the total score 
and range from .97 to .98 for the subscores. 

Many colleges and universities are drawn to the COMP exam because it 
offers to provide objective evidence of student intellectual growth (value 
added) over the course of a college education. Students who persist at an 
institution can be tested upon entrance and again at the end of two or four 
years of college in order to determine the growth attributable to their 
educational experiences (Forrest 1982). Partly because many institutions 
are unwilling to wait two or four years to evaluate student learning, COMP 
staff provide an estimate of student gain. Based on the fact that the 
correlation between total scores on the Objective Test and entering ACT 
Assessment Composite scores is .70, the COMP staff have constructed a 
concordance table from which institutions may estimate mean freshman COMP 
scores if they have mean ACT Assessment Composite scores (or mean SAT 
scores) (Banta, et al., 1987). By subtracting the estimated freshman score 
from the actual score for graduating students, an estimate of score gain, 
or value-added, can be obtained. 

The University of Tennessee, Knoxville (UTK) has the most extensive 
institutional COMP database in the country. Since 1980 several hundred 
seniors at UTK have been tested annually using the COMP Objective Test. In 
1985, the test became a graduation requirement for every senior, and the 
test is annually administered to a sample of approximately one-half of the 
freshman class. When students take the COMP exam at UTK they are asked to 
complete an extensive survey designed to gather information about their 
previous educational experiences. These responses are combined with back- 
ground data from student records and test scores to produce the UTK data- 
base. As of 1989, the UTK database on the COMP exam contains test scores, 
background data, and survey responses for more than 20,000 students. 

Because UTK is in a unique position to provide evidence of the techni- 
cal quality of the COMP exam, it has undertaken a variety of research 
projects. In conducting research on the COMP exam, it was impossible for 
UTK to divorce itself from the context in which COMP scores are used -- the 
Tennessee performance funding program. The results of a representative 
sample of these projects are included in this publication. As a guide, an 
annotated bibliography of this research is provided. 
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ANNOTATED BIBLIOGRAPHY 

Banta, T. W. , Lambert, E. W. , Pike, G. R. , Schmidhammer , J. L. , & 
Schneider, J. A. (1987). Estimated student score gain on the ACT COMP exam: 
Valid tool for institutional research? Research in Higher Education 27 
195-217. ' — ' 

In this article, the authors examine issues related to the reliability and 
validity of a measure of the value added by education based on estimated 
score gains on the ACT COMP exam. Using data from the 1985 (N-843) and 
1986 (N-2226) academic years, several problems with the reliability and 
validity of estimated gain scores are identified. Concerns related to 
reliability include: (1) the large standard deviation of gain scores; 
(2) the significant differences between estimated and actual gain scores; 
and (3) the unreliability of difference scores as indicators of change! 
Concerns related to validity include: (1) the significant negative correla- 
tion between pretest scores (ACT Assessment Composite) and estimated gain; 
(2) the systematic exclusion from estimates of gain of subgroups due to the 
absence of ACT Assessment scores for those groups; and (3) the presence of 
counter-intuitive relationships between estimated gain scores and measures 
of students' educational experiences. 

Banta, T. U. , & Pike, G. R. (in press). Methods of comparing outcomes 
assessment instruments. Research in Higher Education ^ 

The purpose of this article is to outline a strategy for use by faculty in 
comparing the relative efficacy of outcomes assessment instruments in 
gauging program effectiveness. The methods described by the authors in- 
clude: (1) asking faculty to compare the content of the instruments with 
objectives for the general education program; (2) asking students about 
their perceptions of the instruments; and (3) analyzing the psychometric 
properties of the instruments. Based on analyses of the ACT-COMP exam and 
the Academic Profile, the authors conclude that neither test covers more 
than 30 percent of the general education goals defined as important at UTK. 
Furthermore, students tend to rate both tests as poor measures of their 
general education experiences, and analyses of the psychometric properties 
of these tests indicate that both are primarily measures of individual 
differences . 



44 



I 

1 
i 
I 
I 
I 



I 
I 

1 
I 
1 



i 
1 
I 
I 



Phillippi, R. H. (1989). A comparison of reliability and difficulty levels 
SOX. — three forms of the COMP (Research Report No. 89-02). Knoxville, 

TN: University of Tennessee, Center for Assessment Research and Develop- 
ment. 



Based on data from Forms 7, 8, and 9 of the COMP Objective Test, the author 
concludes that the most likely response on a given item on the COMP exam by 
chance alone is one right and one wrong. A score on the test, therefore, 
is actually a measure of the number of items for which both responses are 
correct. Based on this type of scoring, a reliability analysis of Forms 7, 
8, and 9 shows little change across forms. Analysis of item difficulties 
shows that all three forms of the test are excessively easy for seniors, 
although Forms 7 and 8 are reasonably difficult for freshmen. Form 8 of 
the test was the most difficult form of the test for seniors. However, 
seniors also scored highest on form 8. This paradox can be explained by 
the equating process used for the COMP exam. (UTK students score signifi- 
|j cantly higher than the equating population on this form of the test.) The 

w net effect is a two-point gain in equated scores for students at UTK. 



Phillippi, R. H. (1989). Analysis of the impact of social sciences 
coursework on ACT- COMP scores (Research Report No. 89-01). Knoxville, TN: 
University of Tennessee, Center for Assessment Research and Development. 

The research presented in this report was designed to assess the impact of 
differential patterns of social science coursework on total score and all 
subscale scores of the ACT-COMP exam. Using COMP scores for more than 
10, 000 seniors tested at UTK, the author found that social science 
coursework significantly affects scores on the Functioning within Social 
Institutions, Using Science and Technology, Using the Arts, Communicating, 
and Clarifying Values subscales. Social science coursework did not influ- 
ence Solving Problems scores. In every case, the magnitude of these ef- 
fects is quite small. The author concludes: "It appears that the validity 
of total score on the ACT-COMP for measuring the effects of education is 
questionable. Education, as measured in coursework, seems to effect 
subscores on the ACT-COMP, but those effects seem to be counterbalanced by 
opposite effects with other subscores. All of this results in total scores 
which do not reflect differences on the basis of education." 
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Pike, G. R. (1989). A compariso n of the College Outcome Measures Pro gram 

(COMP) and the Co llegiate Assessment of Academic Proficiency (GAAP) exams 
(Research Report No. 89-10). Knoxville, TN: University of Tennessee, Center 
for Assessment Research and Development. 

Using the concepts of the substantive, structural, and external components 
of construct validity, the author investigates the use of the COMP and CAAP 
exams at UTK. Although the COMP exam was slightly superior to the CAAP in 
its coverage of general education content, neither test covered more than 
30% of UTK's general education goals. In addition, neither test provides 
highly dependable measures of educational outcomes. This research also 
raises serious questions about the appropriateness of the scoring model 
used by these exams, both at the item level and in the construction of 
subscales. The most disappointing result of this research is the insensi- 
tivity of the COMP and CAAP exams to patterns of general education 
coursework. The variables that do predict successful performance on the 
COMP and CAAP exams are the students' levels of ability when they enter 
college, self- reports of how hard they try on the tests, and to a lesser 
extent, their gender and grade point averages in college. 



Pike, G. R. (1989;. Students' background characteristics, educational 
experiences, educational outcoaes: A model for evaluating assessment in- 
struments. In S. J. Reithlingshoefer (Ed.), Developing assessment partner- 
ship — between community and se nior colleges: The theory and practice of 
outcoaes assessment (pp. 79-91). Fairfax, VA: George Mason University. 

The research presented in this paper investigates the criterion-related 
validity (instructional sensitivity) of the ACT-COMP exam. Two require- 
ments for criterion-related validity are proposed and tested: (1) test 
scores should be more strongly related to educational experiences than to 
background characteristics; and (2) subscores should be differentially 
related to students' educational experiences. Two conclusions regarding 
the validity of the COMP exam emerge from this research on 2523 seniors: 
(1) while COMP scores are significantly related to the total amount of 
coursework, the explanatory power of this relationship, relative to the 
power of relationships between COMP scores and background characteristics, 
is quite small; and (2) subscores on the COMP exam are not differentially 
related to educational experiences, indicating that these subscores provide 
no unique information for program planning and evaluation. 
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Pike, G. R. (1989). The performance of black and White students on the ACT- 

COME exam: An analysis of differential item functioning using Samejima's 

graded model (Research Report No, 89-11). Knoxville, TN: University of 
Tennessee, Center for Assessment Research and Development. 

In this research report, the author describes the results of a study de- 
signed to evaluate differential item functioning for 471 blacks and 9237 
white students who completed Form 8 of the COMP Objective Test. Using item 
response theory, the author found that 58% of the questions on the Objec- 
tive Test produced significant levels of difficulty- shift in favor' of 
whites. No instances of significant difficulty-shift in favor of blacks was 
found. The author notes that rates of differential item functioning are 
not evenly distributed across COMP subscales, and when items are catego- 
rized on the basis of content skills (identification versus explanation), 
questions designed to measure explanation produce higher rates of difficul- 
ty shift. In addition, the author suggests that questions which are based 
on unfamiliar stimuli may produce high rates of differential item func- 
tioning. 



Pike, G. R. (1989). Background characteristics, college experiences, and 
the ACT-COMP exaa: Using construct validity to evaluate assessment 
instruments. Review of Higher Education , 13 . 91-117. 

This article presents one criterion for judging the appropriateness of 
standardized tests as assessment instruments. To be valid indicators of 
program effectiveness, test scores should be related to factors associated 
with educational quality and should discriminate between these indicators 
and variables beyond the control of higher education (e.g., demographic 
characteristics). Research on the appropriateness of the ACT-COMP exam for 
program evaluation was conducted using data from 5936 seniors. This re- 
search provides mixed results. Specifically, results indicate that the 
ACT-COMP exam is a better measure of individual differences (entering 
academic ability/achievement) than of program quality. While the COMP exam 
is primarily a test of individual differences, it is sensitive to indica- 
tors of program quality (patterns of coursework) once effects for back- 
ground characteristics are removed statistically. 
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■ Pike, G. R., & Banta, T. tf. (1989, March). Using content val idit y to evalu- 

ate assessment i nstruments: A comparison of the ACT-COMP exam and the ETS 

I Academic — Profile. Paper presented at the annaal meeting of the American 
Educational Research Association, San Francisco. 

I Using Messick's work on construct validity, the authors examine the sub- 
stantive, structural, and external components of test use by the Tennessee 
Higher Education Commission (THEC) and the University of Tennessee, Knox- 

Iville (UTK) . Results indicate that both the COMP exam and the Academic 
Profile are equivalent in their coverage of UTK's general education goals. 
Regarding the structural component of test use, the Academic Profile is 
superior to the COMP exam in its ability to accurately differentiate among 

I students/programs. Analysis of the structural component also reveals that 
both tests measure a single underlying construct, and analysis of the 
external component suggests that this construct is academic ability, not 

I program quality. Analysis of data from the two tests also suggests that 
the COMP exam is somewhat more sensitive to educational effects than is the 
Academic Profile, but only after the effects of entering abili- 
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ty/achievement ara removed. 

Pike, G. R., & Phillippi, R. H. (1989). Generalizability of the differen- 
tial coursework methodology: Relationships between self-reported coursework 
and performance on the ACT-CGMP exam. Research in Higher Education 30 
245-260. ' ~' 

The Differential Coursework Patterns Project (DCPP) appears to offer a 
method of linking outcomes measures to program data. However, questions 
about the generalizability of this method have been raised. This study 
indicates that the DCPP methodology can be used with at least two different 
tests, and that coursework measures can be gathered either through tran- 
script analysis or students' self-reports. Specific findings of this 
research on 2943 seniors indicate that residuals for the six COMP subscores 
(produced by regressing subscores on ACT Assessment scores, age, and moti- 
vation scores) are related to coursework clusters. For example, coursework 
in the humanities tends to be related to residuals for Solving Problems, 
Clarifying Values, and Using the Arts, while calculus and physical science 
coursework tends to be related to residuals for Communicating and Using 
Science and Technology. 
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Pike, G. R., & Phillippi, R. H. (1989, May). Using peneralizabilitv theory 

ID instituti onal research . Paper presented at the Forua of the Association 

for Institutional Research, Baltimore. 

For assessment practitioners, errors of measurement are of critical concern 
because they can lead to incorrect judgments about the quality of education 
programs and suggest inappropriate strategies for program improvement. 
This paper describes the use of generalizability theory in identifying 
errors of measurement and indicates that variance in students and items 
should be considered sources of error when group means are the unit of 
analysis. Using a synthetic data set of 9000 respondents that is very 
similar to the national data compiled by ACT, the authors conclude that the 
use of group means (instead of individuals) as the unit of analysis will 
likely decrease the dependability of measurement, not increase it. 
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TENTATIVE OUTLINE FOR NASHVILLE WORKSHOP: 
"Strategies for Assessing Outcomes" 

Sponsored by the Assessment Resource Center 
1987 

Sunday, March 22 

7:00 - 9:00 p.m. REGISTRATION for early arrivals 
Monday, March 23 

8:00 - 8:45 a.m. REGISTRATION and Coffee 

8:45 - 9:00 WELCOME and Introductions—by Peter Ewell 

OVERVIEW OF ASSESSMENT 

9:00-9:15 At Alverno College— by Georgine Loacker 

9:15 - 9:30 At Northeast Missouri State University— 

by Stuart Vorkink 
9:30 - 9:45 At The University of Tennessee, Knoxville— 

by Trudy Banta and Homer Fisher 

9:45 - 10:30 INITIATING AN ASSESSMENT PROGRAM - Essential Elements 

Consultant Panel moderated by Peter Ewell 
10:30 - 11:00 Questions from Participants about Initiating a Program 

11:00 - 11:15 BREAK (Tables of materials provided by Center and by 

participants available for review) 

11:15 - 12:30 p.m ASSESSMENT METHODS - Concurrent Sessions 

Assessing Student Abilities - Georgine Loacker 
Assessing General Education Outcomes - 

Aubrey Forrest of ACT 
New Approaches from ETS - Roy Hardy of ETS 
Assessing Student Learning in the Major - 

Stuart Vorkink and Trudy Banta 
.Using Survey Data in Institutional Planning - 

Homer Fisher and Bill Lyons of UTtf 
Using Available Campus Data - Peter Ewell 
Filmed Interview with UTK Faculty Concerning 

Assessment - Gary Pike of UTK 

12:30 - 1:45 LUNCHEON and Opportunities for Groupings bjr 

Institutional Type 

2:00 - 3:15 CONCURRENT SESSIONS (Reprise) 

3:15 - 3:30 BREAK 

3:30 - 4:15 PANEL OF CONSULTANTS Summarizing Concerns Expressed 

in Concurrent Sessions 

4:15 - 5:00 MEETINGS OF INSTITUTIONAL TEAMS to Discuss Plans for 

Assessment and Contribute Questions for Tuesday 
Morning Session (all consultants available for 
conferences ) . 

5:00 - 6:15 RECEPTION (wine-cheese-fruit) - Meetings by 

Institutional Type 

6:30 Free for Supper in Nashville - A 
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Tuesday, March 24 



8:00 - 8:30 a.m. CONTINENTAL BREAKFAST 

8:30 - 10:00 CONSULTANT PANEL Addressing Various Topics (Some 

submitted by participants Monday evening) 

External incentives and internal benefits of 
assessment 

Motivating faculty and students to participate 

Organizing for data collection 

Converting data to information for planning 

Financing assessment activities 

The current politics of assessment 

10:60 - 11:00 Consultants Respond to Additional Questions from 

Participants 

11:00 - 11:15 BREAK 

11:15 - 12:00 CONCURRENT SESSIONS 

M What Works in Comprehensive Universities" 
"What Works in Liberal Arts Institutions" 
(Previously identified participants will present 
brief reports on successful practices at their 
own institutions) 

12:00 - 1:30 LUNCH 

1:30 - 3:00 Opportunities for teams to confer with consultants 

(Optional activity) 
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USING OUTCOMES INFORMATION IN THE ASSESSMENT 
OF INSTITUTIONAL EFFECTIVENESS 



— WORKSHOP PROGRAM — 



Thursday, August 27, 1987 



11:00AM - 1:00PM 



1:00 



2:00 



2:30 



3:15 



3:30 



4:15 



5:00 

5:15 
6:00 



2:00 



2:30 



3:15 



3:30 



4:15 



5:00 



5:15 

5:45 
7:00 



WORKSHOP REGISTRATION 



USING OUTCOMES INFORMATION TO ASSESS 
THE ACCOMPLISHMENT OF INSTITUTIONAL 
MISSION AND GOALS - Homer S. Fisher, 
Executive Vice-Chancellor for Business, 
Planning and Finance Trudy W. Banta, 

Research Professor and Director, Assessment 
Resource Center 

SYSTEM ENVIRONMENT SUCCESS FACTORS - 

F. Ramsey Valentine, Director, Office of 

Information Systems 

USING EXISTING CAMPUS INFORMATION - 
John T. Hemmeter, Director, Office of 
Institutional Research 

REFRESHMENT BREAK 



GATHERING INFORMATION THROUGH SURVEYS - 
William Lyons - Professor, Political 
Science and Associate Director, Office of 
Institutional Research 

GATHERING AND USING TEST DATA: AN 
AUTOMATED APPROACH - Edward L. Medford, 
Senior Programmer/Analyst, Office of 
Information Systems 

WORK OF THE ASSESSMENT RESOURCE CENTER - 
Gary R. Pike, Assistant Director, 
Assessment Resource Center 

DISCUSSION 



Meeting Area 
Lobby 

Cherokee "C" 



Cherokee "C" 



Cherokee "C" 



Meeting Area 
Lobby 

Cherokee "C" 



Cherokee "C 



Cherokee "C" 



Cherokee "C" 



RECEPTION FOR PARTICIPANTS, UTK FACULTY, 
AND ADMINISTRATORS 



Great Smoky 
Mountain Center 
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Friday, August 28, 1987 



8:00AM - 8:30AM 



8:30 



10:00 
10:30 



11:15 

11:45 
12:30 



.9:15 



9:15 - 10:00 



- 10:30 

- 10:45 



10:45 - 11:15 



11:45 

12:30PM 
2:00 



CONTINENTAL BREAKFAST 

TECHNICAL CONSIDERATIONS IN USING SURVEYS 
AND EXISTING CAMPUS DATA - John T. Henmeter, 
William Lyons, & Don S. Scroggins, Project 
Coordinator, Office of Institutional Research 

TECHNICAL CONSIDERATIONS IN USING TEST 
DATA - Edward L. Medford 

DISCUSSION 

REFRESHMENT BREAK 



Meeting Area 
Lobby 

Cherokee "C" 



MOTIVATING PARTICIPANTS AND REPORTING 
ASSESSMENT RESULTS - Trudy W. Banta 

RESEARCH USING ASSESSMENT DATA - 
Gary R. Pike, 

DISCUSSION 

LUNCHEON 



Cherokee "C" 

Cherokee "C" 

Meeting Area 
Lobby 

Cherokee "C" 

Cherokee "C" 

Cherokee "C" 
f herokee "A&B" 
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STRATEGIES FOR ASSESSING OUTCOMES 
— WORKSHOP SCHEDULE ~ 



Sunday, November 8. 1987 

7:00pm - 9:00pm Early Registration (Hyatt Lobby) 



Monday. November 9, 1987 



8 : 00am 


- 8 


:30am 


Registration 


8:30 


- 8 


:45 


Welcome & Introduction - Peter Eweli 


8:45 


- 9 


:15 


Brief Overview, Assessment at Four Institutions 


9:15 


- 10 


15 


Initiating an Assessment Program 

Panel Discussion Moderated by Peter Ewell 


IV I 10 


1 n 




Refreshment Break 


10:30 


- 11: 


45 


Assessment Strategies (Concurrent Sessions) 


Noon 


- 1: 


15pm 


Luncheon 


1:30 


- 2: 


45 


Concurrent Sessions (Reprise) 


2:45 


- 3: 


00 


Break 


3:00 


- 3: 


45 


Summary of Questions/Concerns Expressed in the 
Concurrent Sessions 


3:45 


- 4: 


45 


Meetings of Institutional Teams (Consultants Ave 


5:00 


- 6: 


15 


Reception 



6:30 - Free for Supper in Memphis 

Tuesday r November 10. 1987 

8:00am - 8:30am Continental Breakfast 

8:30 - 10:00 Panel Discussion 

External Incentives and Internal Benefits 
Motivating Faculty and Students 
Organizing for Data Collection 

(Additional Topics aay be Submitted by Participants) 
10:00 - 10:15 Break 

10:15 - 11:15 Concurrent Sessions (by Institutional Type/Size) 

11:15 - Noon Summary of Topics Discussed in the Concurrent Sessions 

and Workshop Summary by the Panel 

Noon - 1:15 Luncheon 
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STRATEGIES FOR ASSESSING OUTCOMES 
— WORKSHOP SCHEDULE — 

§H2^^ x Ji2Veniber_13 x _i988 

7:00pm - 9:00pm Early Registration (Holiday Inn Lobby) 

Monday, Novembe r 14, iq«c 
8:00 - 8:30am Registration 
8:30 - 8:45 Welcome & Introduction 

8:45 - 9:15 Brief Overview. Assessment at Four Institutions 

9:15 - 10:15 Initiating an Assessment Program - Panel Discussion 

10:15 - 10:30 Refreshment Break 

10:30 - 11:45 Assessment Strategies (Concurrent Sessions) 

- Assessment as Learning 

- Does Assessaent Make a Difference? A Case for 

Computer Adaptive Testing 

■ "sJsitSrssr- ar,d As8esaaent in ti,e Gen - ai 

.- The Caapus Political Context for Assessment: 
Faculty. Staff, and Students 

- Organizing the Caapus for Assessment 

- using Surveys in Assessment 

- Using A.ailable Caapus Data 

- Developing Exams in the Major Field - Faculty 

Perspective y 

Noon - 1:15 Luncheon 

1:30 - 2:45 Concurrent Sessions (Reprise) 

2:45 - 3:00 Break 



3:00 - 3:45 



Summary of Questions/Concerns Expressed in the 
Concurrent Sessions - Panel Discussion 



4:00 - 5:00 High Tea 

5:00 - 0:00 



6:00 



^iVTJil J" 8tltutlon ?l Teams and/or Regional Groups 
of Institutions (Panel Members Available) 

Free for supper in Knoxville 



RIC 
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Tuesday : November 15 A 1 988 

8:00ara - 8:30ara Continental Breakfast 

8:30 - 10:00 Panel Discussion 

- External Incentives and Internal Benefits of 

Assessment 

- Motivating Faculty and Stuaents to Participate 

- Organizing for Data Collection 

- Additional Topics to be Submitted by Participants 



10:00 - 10:15 Break 

10:15 - 11:15 Concurrent Discussion Sessions (by Institutional 

Type/Size) (Panel Members Available) t 

11:15 - Noon Summary of Topics Discussed in the Concurrent Sessions 

and Workshop Summary by the r>anel 

Noon -1:15 Luncheon 



APPENDIX E 
Participants at Three Small Seminars 
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April 1988 Kansas City Semirar Participants 

Michael Field Professor of English, Bemidji State University (MN) 

Arnold Gelfman Director of Institutional Research, Brookdale Comm- 
College (NJ) 



Gary Moden 
Iinda Rudolph 



Office of Institutional Research, Ohio University 

Associate Vice President, Austin Peay State 
University (TN) 



Howard Benoist Vice President, Our Lady of the Lake College (TX) 



Scott Olsen 

John Finney 
Aubrey Forrest 
Joe Steele 
David Lutz 
Trudy Banta 
Gary Pike 



Director of Institutional Research, Northeast Missouri 
State University 

Associate Dean, University of Puget Sound (WA) 

ACT 

ACT 

ACT 

Director, Assessment Resource Center, UTK 
Associate Director, Assessment Resource Center UTK 
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October 1988 Princeton Seminar Participants 

John Harris Professor, David Lipscomb University 

Tom Moran Associate Vice President, SUNY Pittsburgh 

Marda Mentkowski Director, Office of Assessment, Alverno College 

Glen Rogers Assistant Director, Office of Assessment, Alverno 

College 

Michael Knight Co-Director, Office of Assessment, Kean College (NJ) 
Donald Lumsden Co-Director, Office of Assessment, Kean College (NJ) 
Professor, Austin Peay State University (TN) 



Tony Golden 
Dary Erwin 

Peter Ewell 

Richard Burns 
Trudy Banta 
Gary Pike 



Director, Office of Assessment, James Madison 
University (VA) 

Senior Associate, National Center for Higher 
Education Management Systems 

Vice President, ETS 

Director, Assessment Resource Center, UTK 
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Appendix A: 



An Annotated Bibliography on 
the Assessment of student Educational Outcomes 



by Gary Pike 



Im-^SS, 19708 and 1980s have witnessed a dramatic growth in the 
°£ he as f essinent of student educational outcomes. 
Because of the variety of information available, this 

152} iS g ?? I 1 ^ n 2 t intended to be exhaustive. Instead, this 
review is provided as a starting point for the study of assess- 
SS^Ah ^°^ co ; venien " f the literature on assessment is organized 
nro2?f»r U Jh^; eS :.^ e basic Principles underlying assessment 

■programs, the identification of educational outcomes, the 
^e?e r aS e ED SumSf L°^ e f an< * the anal * sis of outcomes data. 
S a ??ifp S is indicated (in brackets), the document cited 

I wL^r 1 ^: SeSrSr s^ssr at - *™ >™ 

g Principles Underlying Assessment Programs 

Adelman, c. , "To Imagine an Adverb: Concluding Notes to 

I SiS^aher eS^^V ln Ad6l 2 an ' C * ?ed * > ' M^mt in 

D c • n f iS ^ 0 SSSSJ i S n laaHSft aM Centexis. Washington, 
p.c. . U.S. Government Printing Office, 1986, to. 73-82 

» Assessment (evaluation) is rooted in the nature of language and 
S e h r al "-^ 0n j nhereS f n la ^^ge, assessment sha^fand is 
< £S2S ST SOC i al and w ec onomic institutions. Based on this 

fo^S^zailoAa^nd^ 0 ? ident ifies several important measurement, 
organizational and policy concerns related to the growing 
interest in assessment by institutions of higher education. 

I 2Shm> FAnA^" Cri J ical Validit y Iss «es in the Methodology of 

* ™ ~£<™? I 10n Assessmen V« Assessing £fce Outcomes of Higher 

I ?ni2S^S i~ ' ? pp * 3 ?~46. Baker examines the growing 

■ Jo n assessment, arguing that assessment programs designed 

to measure effectiveness criteria established by state 

8?S22!!2? ?*2 regional accrediting associations often do not 
represent valid means of evaluating educational quality. The 

a SfSL? 0 -^- d % that ! tuden ts' classroom experiences represent 

I 0 ™ L t C ; t0rS ° f ^ alit y« As a result, the autho? 
J2fS!STfi! * assessine " t . Programs focus on outcomes directly 
related to cxassroom experiences. 

I 

I 

i 



fiSatio^l An;i??i A ^ r ° ng ' J * L * ^iamiinct Effesfeiyely for. 
SfSgW fi g al ^y : An putcomes-Base^ Approach for Co lleges 
to Excellence. San Francisco : Jossey-BlssTi^ief" 
me authors provide a model for improving educational quality 
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based on an examination of institutional mission. They advocate 
that institutions examine their missions, develop pilot programs 
designed to assist in accomplishing those missions, assess the 
effectiveness of the pilot programs, and then implement large- 
scale programs. Of relevance to those interested in assessment, 
the authors stress the importance of incorporating outcomes data 
in the planning process. 

Chandler, J.W., "The Why, What, and Who of Assessment: The 
College Perspective," Assessing the Outcomes of Higher Education ' 
Proceedings of the 198$ EES Invitational Conference . Princeton: 
ETS, 1987, pp. 11-18. According to Chandler, assessment should 
focus on programs, not individuals. Assessment also should 
reflect the unique characteristics of an institution. Tailoring 
an assessment program to an institution encourages faculty 
ownership of the assessment program. In addition, the author 
explains why assessment should not be equated with testing. 

Cross, K.P., "Using Assessment to Improve Instruction," Assessing 
the. Outcomes of Higher Education : Proceedings pf jbhe 1986 ETS 
Invj-fotjonal Conference. Princeton: ETS, 1987, pp. 63-70. The 
author argues that evaluations of classroom teaching should be an 
integral part of an assessment program. The author notes that 
one of the best ways to overcome faculty resistance to an 
assessment program is to provide the faculty with the tools to 
assess student learning and satisfaction. 

Enthoyen, A.C., "Measures of the Outputs of Education: Some 
Practical Suggestions for Their Development and Use," in 
Lawrence, B. , Weathersby, G., and Patterson, V.W. (eds.), Outputs 

9± Higher. Education: Their Identification . Measurement , anjl 

Evaluation- Boulder, CO: Western Interstate Commission for Higher 
Education, 1970, pp. 51-60. [ED#043-r;96] The author makes three 
recommendations about the assessment of educational outcomes: 
first, assessment should be coupled with financial incentives; 
second, external evaluations (rather than course examinations) 
should be used in the assessment program; and third, assessment 
activities should be conducted by a central office of program 
analysis and review. 

Ewell, P.T. , lbs Self-Regarding Institution .- Information for 
Excellence. Boulder, CO: National Center for Higher Education 
Management systems, 1984. This volume focuses on the rationale 
underlying the assessment of student outcomes. The author begins 
by identifying four dimensions of student outcomes: "Knowledge 
Outcomes," "Skills Outcomes," "Attitude and Value outcomes," and 
"Relationships with Society and with Particular Constituencies." 
The author also provides examples of how institutions have 
utilized outcomes data in their planning processes to improve 
education programs. 
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I 

■ focusing on two critical aspects of educational iSSSSIS £' 

specification of objectives^ tte meaSureSeS S ? Se extent g e 

I which these objectives are being met. The author concludes tntt 
educational improvement will occur only if assess-i-n? t= ^ ha 
integral part of a process of curriculum devllopmen? aid evaLa" 
lt° n r\ T£ e Argument is based primarily on assessment placticei in 

I2Si? Xt f d Kin ^ m ^.but the principles are universally 
cable. A second edition is scheduled for publication in 1988. 

Iloacker, G., Cromwell, L. , and O'Brien, K. , "Assessment in Higher 
Education: To Serve the Learner," in Adelman, c, (ed.V; 9 * 

Assessment yjx American Higher Education : issues and Contexts 

B Washington, D.c.s U.S. Government Printing Office, 1986, pp! 47- 
63. Loacker sfc Si assert that the ultimate goal of an assessment 

?S g aS?h«r^,, be *° Pr ° m0te StUdent learnin ^ a " d develop ^ 
Ihe authors thus view assessment as a multitrait and multimethod 

ISfflSTJ; 3 " d ¥l* SS the need t0 deVelop valuation actlvUiel 
outside the traditional student-faculty process. 

I Manning, T.E., "The Why, What, and Who of Assessment: The 
\ «?°5?^i n ?^ ASS ^ lati0n Pers P?ctive," in Assessing £he Outcomes 

• S n ?iSSJ/ Proceedings of the 1986 ETS iKvItafen^t ' 

SSf! T C g % Prin ff tont ETS ' 1987 ' P p - 31 " 38 - Manning examines 

f assessment from the perspective of the regional accrediting 
associations, noting that these associations have been advocating 
the use of assessment to improve institutional quality for 
several years. * 1 

I 

' Identification of Educational outcomes 

AA^%i°lt dev f l0 P in 9 an assessment program, institutions must 

SSSSp^!! ?K? COme 2 !? ? e assessed - I" an effort to bring some 
coherence to this undertaking, several scholars have developed 
typologies of educational outcomes, while these typologies 
differ m many important respects, they all assume that student 
outcomes are multidimensional. The common outcomes described in 
™J™^?i°;£ e f can be 3 rou P £d into four categories: cognitive 
outcomes (both knowledge and skills); affective outcomes (such as 
self-concept and moral development) ; attitudinal outcomes 

{i n ?™ in V? V0lveraent and satisfaction) ; and outcomes expressed 
in terms of longer-term economic and social status (and, 
sometimes, participation in cultural, community and political 



1 
I 

I 

1 

I Alexander, J.M. and Stark, J.S., Focusing on Student Academic 
Outcomes : & Working Paper.. Ann Arbor, MI: National Center for 
Research to Improve Postsecondary Teaching and Learning, 1987. 
These authors provide an overview of three typologies of student 
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educational outcomes (Astin, Panos, and creager) . In addition 
they provide brief descriptions of instruments designed to assess 
student outcomes in three areas: academic-cognitive, academic- 
motivational, and academic-behavioral. 

Astin, A.W., "Measuring Student Outputs in Higher Education " in 
Lawrence, B., Weathersby, G., and Patterson, V.W. (eds.), outputs 

fif Hishej: Education: Their Identification . Measurement . 

| ya A uation . Boulder, CO: Western Interstate Commission for~Hiqher 
Education, 1970, pp. 75-84. [ED#043-296] Astin discusses the 
measurement and analysis of educational outputs from a modeling 
perspective, presenting a classic model of the educational 
process consisting of three components: student inputs, the 
college environment, and student outputs, in addition the • 
author stresses the importance of conducting multitrait/ 
multimethod research over time. 

Astin, A. W. , » T he Methodology of Research on College Impact, Part 
One." Sociology of Education, vol. 43 (1970), pp. 223-254. 
Arguing that research on student development should consist o* 
multi-institutional longitudinal studies, Astin identifies 
several research designs and statistical- procedures that are 
appropriate for assessing student educational outcomes. Astin 
also discusses technical issues related to detecting interaction 
effects and controlling for the effects of measurement error. 

Astin, A.W. , "Measurement and Determinants of the Outputs of 
Higher Education," in Solmon, L. and Taubman, P. (eds.), Does 

fy* e 2f - Matter? Some Evidence of the Impacts of Higher 

Education. New York: Academic Press, 1973, pp. 107-127. In this 
article, Astin discusses the relationships between types of 
outcome, data, and time. The first, two dimensions form a 
t^'onomy consisting of cognitive-psychological, cognitive- 
behavioral, affective-psychological, and affective-behavioral 
outcomes . 

Bloom, B,S. (ed.) Taxonomy of Educational Objective , 
handbook li — Cognitive Domain- New York: David McKay, 1956. 
This classic work details a hierarchy of educational objectives, 
ranging from lower-order outcomes, such as knowledge recall, to 
higher-order outcomes, such as synthesis and evaluation. 
Examples of measurement techniques for evaluating the attainment 
of each level in the hierarchy are also provided. 

Brown, D.G., "A Scheme for Measuring the Output of Higher 

?fSS a ? 1 S n :" i n 4 Wrence ' B " Weathersby, G., and Patterson, V.W. 
(eds.) Outputs of Higher Education : Their Identification . 
Measurement, ajjd Evaluation. Boulder, CO: Western Interstate 
Commission for Higher Education, 1970, pp. 27-40. [ED#043-296] 
Brown examines the outputs of higher education from a measurement 
perspective, identifying five categories of educational outcomes 
and six characteristics of effective measurement. Based on his 
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categories, several specific measures are identified. He also 
presents a simple model that can be used to assess educational 
outcomes. 

College Outcomes Evaluation Program, New Jersey Department of 
Higher Education. "Final Report of the Student Learning Outcomes 
Subcommittee." Trenton: Author, 1987. In its report, i-he 
subcommittee examines the purpose of statewide assessment in New 
Jersey and identifies the types of student outcomes to be 
assessed. These outcomes include the general intellectual skills 
needed to analyze and utilise new information, the skills needed 
to understand and use different modes of inquiry, and the 
abilities necessary to appreciate various "continuities in the 
human experience." 

Korn, H.A. Psychological Models of the Impact of College on 
Students. Ann Arbor, MI: National Center for Research to improve 
Postsecondary Teaching and Learning, 1987. Korn describes five 
perspectives on the relationship between college experiences and 
student educational outcomes, and discusses the implications of 
recent advances in personality theory for the assessment of 
student outcomes. Korn also suggests several ways in which the 
models can be used to evaluate the impact of college on students. 

Lenning, O.T. Previous Attempts to Struct ure Educational outcomes 
%nd Outcome-Related Concepts: A Compilation and Review of the 
Literature. Boulder: National Center for Higher Education 
Management Systems, 1977. This report provides a taxonomy of 
educational outcomes based on two literature reviews, impacts of 
higher education on individuals include intellectual development, 
emotional/cultural/social development, and physical development. 
In addition, the author includes potential impacts of higher 
education on society. 

Pace, C.R., "Persp- tives and Problems in Student outcomes 
Research," in Eweli, p.T. (ed.), Assessing Educational Outcomes . 
New Directions for Institutional Research No. 47. San Francisco: 
Jossey-Bass, 1985, pp. 7-18. Pace presents a general overview of 
basic assessment techniques and instruments, identifying four 
categories of outcomes, as well as instruments designed to 
measure these outcomes. Given the variety -of outcomes and 
instruments that may be used in an assessment program, the author 
stresses the importance of selecting outcomes consistent with the 
institution's mission and goals. 

Pascarella, E.T., "College Environmental Influences on Learning 
and Cognitive Development: A Critical Review and Synthesis," in 
Smart, J.C. (ed.), Higher Education : Handbook of Theory and 
Research. New York Agathon Press, 1985, pp. 1-62. Pascarella 
presents a comprehensive synthesis of research on the factors 
influencing students' cognitive development during their college 
careers. He defines two categories of cognitive outcomes 
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(knowledge and skills), discusses rpcMmh *.^i~4.j 

identifies several lnitrSJSS Sa?S5?b.S Se^V?*' and 
various educational outcomes. n US6d to meas ure 

Measurement of Educational Outcomes 
Development of Measures 

Once relevant student outcomes have been identic 

outcot° f ? easureroent «»t be selectenr^Ivelopedlo^^air 
outcome. Assessment programs generally have relied ont5o*vn«« 
of measures: surveys and tests q B v 0 «i f ej - Le " on two types 

mSit? SjSiJf f in J erest and Pilot tests can evaluate item 

Dillman, D.A. Mail and. Telephone Surveys ; The Total Design 
Method. New York: John^Ue^g^irMilmaTDr^ts^n^ 
overview of survey research methodology. Specifiltonics 
addressed include question writing and formlttina £2£?in«* 

Duteous... jjgge a r ch in Higher mestian vol. 2 (Seof? ™ 37- 
«i.tWh?ri°P "Pfrts" research designed to idiSti&'tS ' 
relationship between test scores and seXf-reoorts of 

lTsL™%™t^ that the tw ° indicators' ev?^ce°mode e ?a?e lng - 
Selves o? r "a?SinS?' C ° nClUde ^ self '^ ts «• valid 

?* L ;' "Content Standard Test Scores « Educational an n 
fgg^aisai Meastir^ment, vol. 22, (1962 , pfffs^ This 
S^r-5 eCOminends , that test scores'be interpreted a Content 
given con?eSr^ ndi S^ ng a st « dent ' s ^1 of maste^fof a 
used to sun«L^f e f * Ebe i. ar <? u es that content scores should be 
tilLtl If tit d2w"°? atlV ! Sco f es ' and Prides an extended 
*rel!mt„ a °^ *™ -ores using the 

2Sy!S^ v f? v ^ ^ plo^ent of Measures ^ ^ 

* — fe- ^eativi l-y. GRE Research Report GREB 72-2P. Princeb 
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Educational Testing Service, 1975. In the research described in 
this report, four tests of scientific creativity were developed- 
formulating hypotheses, evaluating proposals, solving methodolo- 
gical problems, and measuring constructs. Results from a 
universe of 4,000 students applying to graduate schools indicated 
that the measures evidence acceptable levels of reliability. In 
addition, scores on each of the four measures were found to be 
independent of scores on aptitude and achievement tests. 

Gronlund, N.E. Constructing Achievement Tests. 3rd edition. 
Englewood Cliffs, NJ: Prentice-Hall, 1983. This short book 
provides a basic introduction to the construction of achievement 
tests. The author addresses all phases of test preparation and 
evaluation, and discusses issues related to the construction and 
scoring of both objective and essay tests. 

Grosof, M.S. and Sardy, H. "Procedure: Measurement, Instru- 
mentation, and Data Collection," in & Research Primer for the 
Social and Behavioral Sciences . Orlando, FL: Academic Press, 
1985, pp. 133-168. These authors provide an overview of several 
measurement techniques, including surveys. They identify the 
various types of questions used in survey research and describe 
several approaches to scaling. They also provide several basic 
recommendations regarding question wording and discuss approaches 
to evaluating questionnaire reliability and validity. 

Hambleton, R.K. , "Determining Test Length," in Berk, R. A. (ed.), 
h Guide £o Criterion-Referenced Test Construction . Baltimore: 
Johns Hopkins University Press, 1984, pp. 144-168. Hambleton 
notes that test length has important implications for the 
reliability and validity of criterion-referenced tests. Five 
different methods of determining test length are described, and 
factors influencing the selection of one of these methods are 
identified. 

Marshall, J.C. and Hales, L.W. Essentials of Testing . Reading, 
MA: Addison-Wesley, 1972. Marshall and Hales provide a 
nontechnical discussion of a variety of approaches to test 
construction. In addition to identifying several principles of 
educational measurement, the authors detail the strengths and 
weaknesses of essay tests, completion tests, multiple-choice 
tests, and true-false tests. 

Martuza, V.R* Applying Norm - Referenced and Criterion - Referenced 
Measurement in Education . Boston: Allyn and Bacon, 1977. 
Martuza describes the use of norm-referenced and criterion- 
referenced tests in educational research. Regarding norm- 
referenced tests, Martuza explains the importance of selecting 
appropriate norm groups, provides criteria for evaluating norms, 
and provides a step-by-step guide for test construction * Martuza 
also suggests several approaches to constructing criterion- 
referenced exams, including linguistic transformation, item- 
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form/ item-frame, amplified objectives, and facet design. 



Mehrens, W-A. and Ebel, R.L. "Some Comments on Criterion- 
Referenced and Norm Referenced Achievement Tests." ncme 
Measurement in Education, vol. 10, (1979), pp. 1-8. [ED#182-324] 
The authors discuss two approaches to achievement testing: norm- 
referenced and criterion-referenced tests, in addition to 
defining these two types of tests, the authors conclude that 
norm-referenced tests are most appropriate for evaluating 
curriculum, while criterion-referenced exams are- most appropriate 
for evaluating students' mastery levels. 

Milton, O. and Eison, J. A. Textbook Tests ; Guidelines for Item 
Writing. New York: Harper and Row, 1983. This is a basic 
introduction to writing test items. The authors underscore the 
importance of well-designed tests and offer several practical 
suggestions concerning item writing. They also include a series 
of exercises that allow the reader to identify the weaknesses of 
test questions. 

Popham, W.J. "Specifying the Domain of Content or Behaviors," in 
Berk, R.A. (ed.) , & Guide to Criterion - Referenced Test 
Construction. Baltimore: Johns Hopkins University Press, 1984, 
pp. 49-77. Popham addresses the issue of how to specify the 
areas of content and/or behavior to be covered in a test, 
stressing the importance of explicit test specification and 
congruent test item development. The author also makes several 
.practical suggestions regarding the specification process that 
have implications for subsequent steps in the test development 
process. 

Roid, g.H. "Generating the Test Items," in Berk, R.A. (ed.) f A 

Guide to . Criterion - Referenced Tes£ Construction . Baltimore: 

Johns Hopkins University Press, 1984, pp. 49-77. Roid reviews 

several item-writing techniques and argues that the quality of 

the items generated in the test construction process can be 

enhanced if the items are based on empirical research. Four 

?i ep f, in the empirically derived item-writing process are 
identified. 



Macro-Evaluation of Measures 



Macro-evaluation of student outcomes measures is concerned 
with the reliability and validity of these measures. There are 
many approaches to evaluating the reliability of outcomes, 
ranging from classical correlational techniques to techniques 
that assess the internal consistency of measures based on 
generalizability theory. Because assessment efforts frequently 
have multiple purposes, multiple approaches to evaluating 
instrument reliability frequently are necessary. 
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The second major criterion for evaluating assessment 
instruments is validity, instruments can be evaluated in terms 
of their content validity, criterion-related validity, and their 
construct validity. As with reliability, the type of validity 
evaluated may change depending on the purpose of the assessment 
program. 

Anastasi, A. Psychological Testing . 4th edition. New York: 
Macrcillan, 1976. This book is a basic reference work on the 
development, use, and evaluation of psychological tests. Topics 
addressed include ethical issues ir< the use of psychological 
tests, evaluation of instrument reliability and validity, and 
item analysis. In addition, the author identifies and analyzes 
several different ty^es of tests, ranging from educational 
(achievement) tests to personality measures. 

Berk, R.A. (ed.) & Guide to Criterion-Referenced Test 
Construction . Baltimore: Johns Hopkins University Press, 1984. 
This book contains essays that provide a technical discussion of 
the construction and evaluation of criterion-referenced tests. 
Essays on the evaluation of tests address issues of reliability 
and validity, noting that the decision to utilize a specific 
approach must be guided by the intended uses of the test data. 
In addition, essays on evaluating the reliability of cut-off 
scores and categorizations based on cut-off scores are included. 

Cronbach, L.J. "Test Validation," in Thorndike, R,L. (ed.), 
Educational Measurement . 2nd edition. Washington, D.C.: American 
Council on Education, 1971, pp. 443-507. Cronbach's essay is a 
touchstone for understanding test validity. The author explains 
the goals of validation procedures and examines several types of 
validity: content validity, educational importance, construct 
validity, validity for selection, and validity for placement. 

Cronbach, L.J. and Meehl, P.E. "Construct Validity in 
Psychological Tests." Psychological Bulletin , vol. 52, (1955), 
pp. 281-302. These authors examine procedures for validating 
psychological tests, focusing on construct validity. They 
indicate when construct validation of tests is appropriate and 
examine the assumptions underlying construct validity. 

Gardner, E. "Some Aspects of /the Use and Misuse of Standardized 
Aptitude and Achievement Tests, 11 in Widgor, A.K. and' Garner, 
W.R. (eds.), ftMlitiy Testing : Uses . Consequences f ^nd 
Controversies : part II . VJashington, D.C.: National Academy 
Press, 1982, pp. 315-332. Gardner identifies six categories of 
misuse associated with an unquestioning reliance on standardized 
tests: acceptance of the test title for what the test measures; 
ignoring the error of measurement in test scores; use of a single 
test score for decision making; lacl% of understanding of test 
score reporting; attributing cause of behavior measured to the 
test; and test bias. 
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Linn, R. "Ability Testing: Individual Differences. Prediction 
and Differential Prediction," in Widgor, A.K.aSd Garner wr' 
pIS'U T r t ^" g : DfiM ' consequences, ajici Controvert;. 

" a ? nin 9 ton ' D.C.: National Academy Pres s, 1982, pp. 
335-388. This essay examines the use of standardized tests to 
!IS?li 1 5 dlV1 ?! al . differences : The author addresses issSes 
related to criterion and predictive validity for educational and 

Lehmann provide a general overview of measurement and e£al« a 2?«« 
in education. The chapter on reliebili1^i«^.«^pSoSJl^ 0 
estimating reliability based on correlational and gene?a?iza?il! 
ity theories. The chapter on validity identifies leveral dlftll 
ent types of validity and presents methods for SeffSLSSSeT 

J ; C « "liability," in Thorndike, r.l. (ed.), 
Educational Measurement . 2nd edition. Washington, D?c • American 
Council on Education, 1971, pp. 356-442. This Sasic reference on 
?oSc a ^ n ?iahJ la 5 ility ln „ National measurement eLlinel iZt 
5?Si «i llgh V f research on individual variation, and identi- 
fies sources of variation in test scores. The author also 
presents procedures for estimating reliability^ using classical 
correlational techniques and generalizabiliS thSr? and 
discusses methods of estimating the reliability o^haSge scores. 

Wigdor, A.K. and Garner, W.R. (eds.) Abilitv Testino- U-ses 
aM controversies: £ait f^inlgnfo.^f ' 
National Academy Press, 1982. Part I of this work is the renort 

tL^LuT^ °" T6Sting of the Assembly of Behavioral 

nrfvfSf i 1 Sciences ' National Research Council. The report 

5E32T alsocSSd 6 ^?!^ ! 6St ^ ng (including the ?ontro- 

abilSv J! 2 With ab 4 lt: y testing) , identifies the uses of 
SISi £. tests ' and re commends a series of actions for the 
evaluation and improvement of ability tests? 

Micro Evaluation of Measures 

analv^« r °^ eVa iy a ^ ion of outco *es measures is concerned with the 
avaiSL to'an^v^ ^ e ^ ions < ite * s > • Several procedures are 
item ina? v ;?c analyze test ltelns ' ranging from relatively simple 
SSSeSS?2"; procedures to mathematically sophisticated 
ST33 Slier sW?^ 6 ? ResP °r Se Theory ( IRT > ' APProaches based 
estimates that J"* advan ! a 9 es (e.g., item difficulty 

SudSS? ?nm ary acc °rdmg to the ability level of the 

ae?ec3ng t2t lBS*£? als0 * ave ^rtant applications in 
tail^ranf^.^^^-* SCores ' and in developing 
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Berk, R.A. "Conducting the Item Analysis," in Berk # R.A. (ed.), 
A Guide to Criterion-Referenced Test Constructio n, Baltimore: 
Johns Hopkins University Press, 1984, pp. 97-143, Berk presents 
a technical discussion of the procedures that should be used to 
determine if individual test items function as they were 
intended. He emphasizes that both expert judgment and 
statistical techniques should be used to evaluate test items. In 
addition to providing a discussion of specific judgmental and 
statistical tests, he identifies step-by-step procedures for item 
analysis. 

Diederick, P. Short-cut Statistics for Teacher-Made Tests . 
Princeton: ETS, 1973. The author presents an introduction to 
the analysis of item quality for the less sophisticated 
mathematician. Topics addressed in the text include reliability, 
measurement error , and item analysis. 

Hambleton, R.K. and Cook, L.L. "Latent Trait Models and Their 
Use in the Analysis of Educational Test Data." Journal of 
Educational Measurement , vol. 14, (1977), pp. 75-96. This 
article represents a general introduction to the use of latent 
trait (item response) models in education research. The authors 
begin by identifying the fundamental principles underlying latent 
trait theory, identify several common latent trait models, and 
suggest several applications for these models. 

Hambleton, R.K. and Swaminathan, H. ytem Response Theory : 
Principles and Applications , Boston: Kluwer-Nijhoff , 1985. 
The authors provide a basic reference work on item response 
theory. Topics addressed include ability scales, model fitting, 
and practical applications of item response theory. 

Lord, F.M. Applications of Item Response Theory to Practical 
Testing Problems , Hillsdale, NJ: Lawrence Erlbaum Associates, 
1980. In this technical discussion of item response theory. 
Lord identifies several applications of IRT, including tailored 
testing, ability testing, studies of item bias, and estimation of 
true-score distributions. 

Office for Minority Education. An Approach for Identifying anfl 
Minimizing Bias .ijn Standardised Tests ; A Set of Guidelines . 
Princeton: ETS, 1980. This report explains the issues related 
to bias in testing, and presents a series of guidelines for 
eliminating item bias in test construction and evaluating 
existing tests to detect biased items. 



Assessment of Writing/Using Essay Examinations 



Breland, H.M., Camp, R. , Jones, R.J., Morris, M.M. , and 

Rock, D.A. Assessing Writing Skill . Research Monograph No. 11. 
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New York: College Entrance Examination Board, 1987. These 
authors describe a study designed to assess writing skill at six 
colleges and universities. Results indicated that the 
unreliability of essay scoring could be alleviated by relying on 
multiple essays or by combining objective and essay tests. The 
authors also demonstrate the use of a variety of data analysis 
techniques. Both essay and objective tests were found to be" 
about equal in their predictive validity. The authors conclude 
that multi-method assessment techniques offer both theoretical 
and practical advantages over other approaches. 

Coffman, W.E. "Essay Examinations," in Thorndike, R.L. (ed.), 
Educational Measurement. 2nd edition. Washington, D.C.: American 
Council on Education, 1971, pp. 271-302. in this chapter, the 
author examines the advantages and limitations of essay tests as 
assessment tools, with specific attention to issues related to 
the reliability and validity, m addition, the author offers 
several suggestions for improving the use of essay exams. 

Coffman, W.E. "On the Validity of Essay Tests of Achievement." 
Journal of Educational Measurement , vol. 3, (1966), pp. 151-156. 
This author reports research concerning methods of validating 
essay and objective tests. Traditionally, essay and objective 
tests have been correlated in order to demonstrate the predictive 
validity of objective tests. The author examines the predictive 
power of a sample of essay questions independent of objective 
measures. 

Cooper, P.L. Tho Assessment of Writing Ability; A Review of 
Research, gre Research Report greb 82-15R. Princeton: ETS, 
1984. The psychometric and practical issues related to the 
assessment of writing are the focus of this review. The author 
notes that although essay tests are considered to be more valid 
than multiple-choice tests, variability in subjects' scores may 
be influenced by a wide range of irrelevant factors. The author 
contends that when procedures to correct for threats to 
reliability and validity are employed, essay tests correlate very 
highly with multiple-choice tests. 

Crocker, L. "Assessment of Writing Skills Through Essay Tests," 
in Bray, D. and Belcher, M. J. (eds.), Issues in student 
Assessment. New Directions for Community Colleges, No. 59. 
San Francisco: Jossey-Bass, 1987. in discussing the use of 
essay tests in assessing basic writing skills, the author 
provides a rationale for using essay exams to assess writing 
abilities and identifies the steps required to develop a writing 
assessment program. These steps include: developing prompts 
(topics), developing scoring procedures, training raters, field 
testing, and administering the instruments. The author also 
examines issues related to the reliability and validity of essay 
exams . 
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Keeley, s.M., Browne, N.M., and Kreutzer, J.s. 11 A Comparison of 
Freshmen and Seniors on General and Specific Essay Tests of 
Critical Thinking." Research in Higher Education , vol. 17, 
(1982), pp. 139-154. These authors report research utilizing 
essay tests to evaluate the critical-thinking skills of freshmen 
and seniors. Results indicate that educational experiences 
produce significant gains in critical-thinking skills. An 
important finding for assessment practitioners was that 
significant differences in students' writing samples are related 
to the type of instructions (general or specific) provided for 
the assessment. 

Steele, J.M. "The Assessment of Writing Proficiency via 
Qualitative Ratings of Writing Samples." Paper presented at the 
Annual Meeting of the National Council on Measurement in 
Education, San Francisco, 1979 • [ED#175-944] Steele examines 
several strategies for improving the reliability of raters' 
evaluations of writing samples. Research has indicated that 
increasing the number of writing samples per student to three 
significantly increases interrater reliability. However, using 
more than two raters does not improve reliability significantly. 

Steele, J.M. "Trends and Patterns in Writing Assessment." Paper 
presented at the Annual Conference on the Assessment of Writing." 
San Francisco, 1985. [ED#268-146] The author describes the 
writing assessment portion of the College Outcome Measures 
Project (COMP) Composite Examination. He notes that the COMP 
exam, unlike many writing assessment instruments, focuses on 
writing in problem solving and critical thinking situations. 
Instead of providing a single holistic rating, the COMP writing 
assessment provides scores in three areas of writing proficiency. 

White, E.M. Testing and Assessing Writing . San Francisco: Jossey- 
3ass, 1985. This book offers an overview of issues related to 
the assessment of writing. Included are discussions of holistic 
scoring, the use of proficiency tests, selection and/or develop- 
ment of writing tests, and the evaluation/scoring of writing 
assignments. 



Nontraditional Outcomes Measures 



During the last decade, there has been a marked increase in 
the use of nontraditional approaches to assess student 
educational outcomes. As a general rule, these approaches have 
been intended as supplements to existing measurement techniques. 
Reliance on multiple assessment methods has been shown to improve 
the validity of evaluations. 

Most of the nontraditional measurement approaches have 
focused on the assessment of student performance through such 
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techniques as assessment centers, simulations, and external 
evaluators. Exceptions have included computer-adapSve^estina 

Un ° btrusive (nonreLtive, mLs^es'^ 

Berk, r.a. fed.) gerformance Assessment - Methods and 
mg? U g at i° n f - Baltimore: Johns Hopkins Unive^iit^ P?iss 1986 

S5ud51! 1 LU2 n S;irS; e renCe .T* on Performance aslelsme'n? 6 * 
includes essays covering * variety of performance assessment 
methods ranging from behavior rating scales to asseslSen? Senter 
techniques. The authors also identify applications of 
performance assessment in business, medicine and the law teach- 
ing, and the evaluation of communication skills? ' 

Z on l'. B ; The External Examine a pproach to assessment . 
Washington, B.C.; AAHE Assessment Foru m, 1987. This m onoaraph 

llllttlLtVl™^? £ h \ USe ° f external eximineJs as an P 
assessment tool. While both British and American experience is 

WiSf?*' special . attention is paid to how JSSiSS 1 

SEfjJ^/J? Sadock ' ?' F - Cpmpu^-Assisted Jest Construct ion. The 
»gft of Jhe Axfe. Princeton: ETS,^8^e5#27 2 -515] These ^ 
de^e?onina S SJf J 0 ^ h . and applications^ computers In 

te^JnS ?I tu administering tests. They contend that adaptive 

Si the^aJSv^h 1 ! ° f thS successful °f compters ?o 
.unprove tne quality of the assessment process. 

?tiJ man '„ J * 'individualizing Test Construction and Administra- 
Re?ereLed a ?esfA" *? *-A. (ed.), £ Gu^e £ filKS£f a 
n »ff,f i 2 ^ Const ruct^on. Baltimore: Johns ifopkins 

revIeTof y thra S pplica^ Mil ^ an P-sen?sTtechnical 

adminis?^;?o„ PP e ? f ? oln P uters in test construction and 

2diK5£??J?; ! peci 5 1C *° pics include traditional attempts to 
individualize testing (equivalent forms of a test) item h?nH«« 

p?ol??2? U ^ r " ad ? tiVe testin *- Millraan notes ?hat't^e a " klng ' 
furtifr °£ cora P«ter-adaptive tests has created t need for 

further research on the cost effectiveness of this approach 
^ioL C ?S C1 ^ ^ h3t assessM ^t practitioners should be very 

^S^^t'^,^^'^' 3 ' "Ensuring the Clinical 
Parents " SLhlSiS al /? h ° o1 Graduates Through Standardized 
1049^052 ^SS^ 5? ^&SBOl Medicine, vol. 147, (1987), pp. 
Patient 'to ?f authors discuss the use of "Standardized 
eS n i0 ° s iT m ^i Ca i stmts' interviewing and physical 
function ?n ™.oZ- ; Standardized Patients" are trained to 
function in multiple roles and to simulate a physician-patient 
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encounter. Preliminary research suggests that tM« 

offers a realistic means of standardizing a PProach 

for medical school graduates? 5 performanc e assessment 

KS^kTof H^KL^ Assessing 

Terenzini argues that tradition*? \ P 87 ? pp * 4 7-61. 

surveys, and 9 lnt«r%^) a 2&^^;2iiSS 0 S meth ° dS (testS ' 
measurement techniques that can overcom^ S by unobt ™sive 
measurement error present in «?h*~ !f? t ? e sourc es of 
inexpensive t? ad»£3Jg£. Tnt StS'SSS^' a ° d are lively 
of unobtrusive measurement teShnESS Presents a typology 

the selection of part?cu?ar measSSI? *"* ° an be USed to 

S3fe The;ry:^|^ T »-^ Application of Latent 

(1977) nn ifiiI?S: n ^ UCat ^°" a - 1 - Measurement vol . 14 

(computer-adaptive) tests t« ^ n of tailored 

computer-adap?ive tested g ^ST'cKi?" 1 * 86 " th * 
and .dentifies future uses fg 2^1!&^^*^ 1 g-^ 

STSe ^'prSSS^ 158 ^ "^S^short 
traditional BMSrSSrLSSiSSL^;? 18 f ?f supplem enting 
identify severarS^^??^^"" 0 ^ 8 ^ ""surls, and 
measurement techniques include ShSSf ? measu rement . These 
simple observation^ •nd^ 1 KrSirSSS: ar ° hiVal data ' 

5PS» B lTt r n y ^e^'bkca^sfgls 
5Sobt?2sive measure afS^ 6 au ^ s famine 'the use of 
interest to ^SS^^^^^J^ 9 ^' ? f P a ^icular 
ways in which unobtrSsiSe m^as^eS authors identify six 
collection methods? measurement can modify traditional data 



Value-Added Analysis of Outcomes Data 

is oo5S?iS? SEXl s^hoMav e 1^t d . by V° Ue * e educat -n 
value added ii defined II I Sin «l e „?£ tlclzed the concept when 
have suggested several Slt!rSJ? w 2f dlffere ? ce score. Writers 
residual and basSSJl meases 3 alV^l*^™' inclu ^ng 
have suggested that reoeateH 9 2* . Stl11 other scholars 

added analyses re P eat ed measures designs be used in value- 
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| Madison: University of Wisconsin Press, i9637M? 2 0fti, 

t2S??J- tl0n 5° th ? problems inherent in the use of change scores 

■ identifies and analyzes three dilemmas associated with thei? S£? 
over-correction/under-correction; unreliability/ iSJSiditJ* and' 
physacalism/subjectivism. *' ■ L " va - LiaiT: y' and 

I i£l a Sf Furby '> " How " e Should Measure ' Change 

are usfd^^hfa^Ss r»ffi^ 

■ prlsenSs a roS^T* 1 * un ? er ^ in * changescSrls^aSd 
presents a formula for calculating residual gain scores. 

I?r«^A„iL and !?u ley ' D * E - "Univariate Analysis of Variance 
Procedures in the Measurement of change »* in Harris eT L * 

8desc?ibes y tSf ti^ Sin / r f S -' 1963 ' PP - I5 ^ 1 ' Ho «? 
the walua£i«„ 5? S underlying a general multivariate model for 
Sina S SftS f f han ^ e - Horst examines the assumptions unde?- 

I S the 2S£w app f oa ? n and Provides a mathematical mSde? 
aSlt£ca?SoiT^i la ! e *. a ^ alysis ? f chan 5 e - This model relies on a 

and co?Smn g ?ec?oL ^ iV*! 0 ? ?° w Vectors subjects 
coiumn vectors represent administrations of an instrument. 

■ C°w d \^ M C " Ele mentary Models for Measuring Change," in Harris 

wisconsln^rSs^a^r'lH? 3 £MQ ^' M * dis0 " ? UniverSty'of 
| inherent in"he' use of fihflllt ' ^ examines sev eral problems 

■ and regression efSctf £ 2£I lncludin * unreliability 
spurioGs relationshios'hJl2° teS - that these P rob fems can produce 

-f-ationsnips between gain scores and other variables; 
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and presents a method for calculating true gain scores. 

McMillan, J.H., "Techniques for Evaluating Value-Added Data: 
Judging Validity, Improvement, and Causal Inferences, » Paper 
presented at the annual meeting of the American Educational 
Research Association, Washington, 1987 • The author' identifies 
several limitations of value-added analyses and describes methods 
off overcoming these limitations* He suggests that researchers 
utilize appropriate research designs and statistical procedures 
when evaluating difference scores • One possible source of 
confirmatory data would be faculty judgments* 

Pascarella, E.T., "Are Value-Added Analyses Valuable?" Assessing 
tjie Outcomes sf Higher Education : Proceedings of £fce 1986 ETS 
Invitational Conference , Princeton: ETS, 1987, pp. 71-92 • 
Pascarella presents a nontechnical discussion of the benefits and 
problems of relying on value-added data. He suggests several 
different methods of overcoming the problems associated with the 
use of difference scores and presents modeling techniques that 
can be used with value-added data. 

Rogosa, D., Brandt, D. , and Zimowski, M. "A Growth Curve Approach 
to the Measurement of Change." Psychological Bulletin , vol. 92, 
(1983), pp 726-748. These authors argue that the criticism of 
change scores as unreliable does not mean that they should be 
abandoned. 

Tucker, L.R. , Camarin, F., and Messick, S. "A Base-Free Measure 
of Change." Psychometrika , vol. 31, (1966), pp. 457-473. In this 
article, the authors identify and discuss problems with the 
calculation and use of simple gain scores. They recommend the 
use of a base-free measure of change, and provide formulas for 
calculating this measure. 

Willett, J.B., "Questions and Answers in the Measurement of 
Change," in Rothkopf, E.R. (ed.), Review of Research in 
Education , volume 14. Washington, D.C.: American Educational 
Research Association, 1987. According to Willett, the 
measurement and analysis of growth (change) is central to 
evaluating educational effectiveness. Willett contends that the 
criticisms of growth measures that have been directed at two- 
wave (pre- and posttest) designs are overstated. Although 
Willett identifies instances in which simple difference scores 
can be reliable and valid, he recommends a multi-wave approach to 
measuring change. 

Wolfle, L.M. "Applications of Causal Models in Higher Education," 
in Smart, J.C. (ed.)# Higher Education : Handbook of Theory 
Research . New York: Agathon Press, 1985. Wolfle examines the use 
of causal modeling as a research tool in the assessment of 
educational outcomes, explaining its assumptions and analyzing 
the concepts of causation and the decomposition of effects. The 
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Appendix E: 



Review of Assessment Instruments 
by Gary Pike 

This review is designed to provide brief descriptions of the 
technical characteristics of many of the instruments mentioned in 
this book. For convenience, the descriptions are organized 
around six types of outcomes: general education, basic skills 
cognitive development, learning in the discipline, values and 
motivation. Within each outcome area, tests are listed in 
alphabetical order. 

Assessment of General Education 

Academic Profile 

P - UblAsher : ETS College and University Programs, Educational 
Testing Service, Princeton, NJ 08541-0001; Scales : Total Score 
Humanities, Social Science, Natural Sciences, Reading, Writing' 
Critical Thinking, and Mathematics; Length : 48-144 items; Time' 
1-3 hours. * 

The Academic Profile has been developed by ETS and the 
College Board to assess the effectiveness of general education 
programs. The Academic Profile is available in two forms: a one- 
hour exam providing group feedback, and a three-hour exam 
providing individual feedback. A panel of experts in the content 
fields supervised test construction, assisting with questions of 
content validity. Because ETS is making the Academic Profile 
available for pilot testing during the 1987-1988 academic year, 
further information about the reliability and validity of this 
test is not available at this time. 

ETS College and University Programs. The Academic Profile . 
Princeton: ETS, 1981. 



ACT Assessment Program 

publisher : American College Testing Program, P.O. Box 168, Iowa 
City, IA 52240; Scales: Composite Score, English Usage, 
Mathematics Usage, Social studies Reading, Natural Science 
Reading; Length: 40-75 items/test; Time: 30-50 minutes/test. 

The ACT. Assessment Program was developed as a series of 
college entrance and placement examinations for high school 
graduates. Depending on the coefficients used, reliability 
estimates have ranged from .73 to .91. Research has found that 
the ACT Assessment is capable of predicting subsequent perfor- 
mance in college, including cumulative grade point average and 
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Munday, L. A. "Correlations Between ACT and Other Predictors of 
^ Academic Success in College." College and University , vol. 44, 

■ (1968), pp. 67-76. 

Richards, J.M., Jr., Holland, J.L., and Lutz, S.W. "Prediction 

Bof student Accomplishment in College." Journal of Educational 
Psychology, vol. 58, (1967), pp. 343-355. 



performance in specific classes. However, research at Tennessee 
Technological University could not demonstrate a relationship 
between gains on the ACT Assessment exam and students' experi- 
ences in college, raising questions about the validity of the ACT 
Assessment exam as a measure of educational effectiveness. 

American College Testing Program. Assessing Students op the way 
to College : Technical Report for. the ACT Assessment Program ^. 
Iowa City, IA: ACT, 1973. 

American College Testing Program. College student Profiles : Norms 
f_o£ Jbhe. ££T Assessment . Iowa City, IA: ACT, 1987. 

Dumont, R.G. and Troelstrup, R.L. "Measures and Predictors of 
Educational Growth with Four Years of College." Research in 
Higher Education , vol. 14, (1981), pp. 31-47. 



! 



g College Basic Academic Subjects Examination 

Publisher: Center for Educational Assessment, University of 

I Missouri-Columbia, 403 South Sixth Street, Columbia, MO 65211; 
Scales: English, Mathematics (2), Science, Social Studies, 
Reading, Reasoning, and Writing (optional) ; Length : approximately 
| 40-120 items; Time : 1-3 hours. 

The College Basic Academic Subjects Examination (College 

EBASE) is a criterion-referenced achievement test that can be used 
to evaluate individuals or programs. One-and three-hour forms of 
the exam are available. Content validity of the College BASE was 
achieved by using expert reviewers during the test construction 
process. Because the exam is being pilot tested during the 1987- 
88 academic year, additional information on reliability and 
validity has not been made available. 

■ Center for Educational Assessment. College BASE . Columbia, MO: 
University of Missouri-Columbia, 1987. 



Collegiate Assessment of Academic Proficiency 

Publisher: American College Testing Program. 2201 N. Dodge Zt. , 
P.O. Box 168, Iowa city, Iowa 52243; Scales : Reading, 
Mathematics, Writing, and Critical Thinking. Length : 175 items 

314 



ERIC 



07 



for all four modules in pilot administration plus 2 prompts for 
writing sample; Time : 40 minutes for each module and 40 minutes 
for the writing sample. 

The Collegiate Assessment of Academic Proficiency (CAAP) is 
a new standardized test intended to assist institutions in 
evaluating their general education programs by assessing those 
academic skills typically developed during the first two years of 
college. The CAAP is available in modules, and institutions way 
add questions to the exam, thereby tailoring the exam to their 
curriculum. Because the exam is being pilot-tested beginning iir 
1988, information on reliability and validity is not available. 

American College Testing Program. Colleaiace Assessment of 
Academic Proficiency: Test Specifications and Sample items . Iowa 
City, IA: ACT, 1988. 



CLEP Education Assessment Series 

Publisher ; The College Board. 45 Columbus Ave. New York, NY 
10023-6917. Scales ; English Composition, Mathematics; Length ; 40 
45 questions per scale; Time ; 45 minutes per module. 

The Education Assessment Series (EAS) consists of two tests 
intended to provide comprehensive, nationally-nonned data in a 
relatively short administration time and at low cost. Because 
multiple forms of the exams will be available, institutions may 
administer them twice and calculate the "value added" by general 
education. The tests are being piloted in 1988, hence informa- 
tion concerning reliability and validity is not yet available. 

The College Board. CLEP In troduces the Education Assessment 
Series . New York; Author, 1988. 



CLEP General Education Examinations 

Publisher ; College Entrance Ej- j&ination Board, 45 Columbus Ave. 
New York, NY 10023-6917; Tests ; English Composition, Humanities, 
Mathematics, Natural Science, and Social Science/History; Length ; 
55-150 items/test; Time : 90 minutes/test. 

The College-Level Examination Program (CLEP) General 
Examinations cover five content areas and were designed to 
provide college credit for non-college learning. Reliabilities 
for the five tests range from .91 to .94. Using panels of experts 
in the content fields, the CLEP test development process has 
achieved satisfactory levels of content validity. While research 
has linked CLEP scores to performance in introductory college 
courses, no studies have been conducted on the validity of the 
CLEP exams as program evaluation instruments. 
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College Entrance Examination Board. Technical Manual Overview . 
Princeton: ETS, 1984. 

College Entrance Examination Board, outcomes Assessment in Higher 
Education . Princeton: ETS, 1986. 

College Outcome Measures Project 

Publisher: ACT, P.O. Box 168, Iowa City, IA 52243; scales : Total 
Score, Functioning within social Institutions, Using Science and 
Technology, Using the Arts, Communicating, solving Problems, 
Clarifying Values, Writing (CE) , Speaking (CE) , Reasoning and 
Communicating (CE) ; Length : 60-99 items; Time : 2.5-4.5 hours. 

The College Outcome Measures Project (COMP) examination was 
designed to measure the knowledge and skills necessary for 
effective functioning in adult society. This exam is available 
in two forms: the Objective Test (0T) , consisting of 60 multiple- 
choice items; and the Composite Examination (CE) , containing the 
same multiple-choice questions and speaking/writing exercises. 
Estimates of reliability for the COMP sub-scales were satisfac- 
tory (ranging from .63 to .81) although research on its validity 
as an assessment instrument has produced mixed results, studies 
by ACT have shown that COMP scores are related to general educa- 
tion coursework and student involvement; however, other research 
by colleges themselves has failed to find a link between COMP 
scores (or gains on the COMP) and effective academic programs. 

Banta, T.W., Lambert, E.w., Pike, G.R., Schmidhammer, J.L. and 
Schneider, J.A. , "Estimated Student Score Gain on the ACT COMP 
Exam: Valid Tool for Institutional Assessment?" Paper presented 
at the annual meeting of the American Educational Research 
Association, Washington, 1987. [ED#281-892] 

Forrest, A. Increasing Student Competence and Persistence : The 
B§st £ase for General Education , iowa City, IA: ACT National 
Center for the Advancement of Educational Practices, 1982. 

Forrest, A. e.nd Steele, J.M. Defining and Measuring General 
Education , Knowledge and Skills , iowa City, IA: ACT, 1982. 

Kitabchi, G. "Multivariate Analysis of Urban Community College 
otudent Performance on the ACT College Outcomes Measures Program 
lest. Paper presented at the annual meeting of the American 
Educational Research Association, Chicago, 1985. [ED#261-091] 

Steele, J. m. "Assessing Speaking and Writing Proficiency via 
^ P J eS 4.° , Behavior *" Paper Presented at the annual meeting of 
tne central States Speech Association, 1979. [ED#169-597] 
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Graduate Record Examinations Program: General Examinations 

Publisher : Graduate Record Examinations Board, CN 6000 , 
Princeton, NO 08541-6000; Scales : Verbal (antonyras, analogies, 
sentence compactions, reading passages) , Quantitative 
(quantitative comparisons, mathematics, data interpretation), 
Analytic (analytical reasoning, logical reasoning) ; Length: 
50-76 items per sub-test; Time : 3 hours, 30 minutes. 

The General Examinations of the GRE are nationally ncrmed 
tests designed to assess learned abilities that are not related 
to any particular field of study, but that are related to the 
skills necessary for graduate study. Research on the GRE General 
Examinations has revealed high levels of reliability (.89 to .92) 
for the three tests. Reliability estimates for the nine item- 
types are somewhat lower (.60 to .90). Research has also found 
that test (and item- type) scores are related to undergraduate 
performance as well as to performance in graduate school. 

Ad'elman, C. The, Standardized Test Scores of College Graduates . 
1964-1982 ,. Washington, D.C.: U.S. Government Printing Office, 
1984. 

Conrad, L. , Trismen, D., and Miller, R. Graduate Record 
Examinations Technical Manual . Princeton: ETS, 1977. 

Fortna, R.O. Annotated Bibliography of the Graduate Record 
Examinations . Princeton: ERIC Clearinghouse on Tests, 
Measurement, and Evaluation, 1980. 

Graduate Record Examinations Board. GRE Guide to the Use of the 
Graduate Record Examinations Program . Princeton : ETS , 1987 . 

Swinton, S.S. and Powers, D.E- A Study of the Effects of Special 
Preparation on GRE Analytical Scores and Item Types . GRE 
Research Report GREB 78-2R. Princeton: ETS, 1982. 

Wilson, K.M. T he Relationship of GRE , General Test Item-Type Part 
Scores to Undergraduate , Grades . GRE Research Report 81-22P. 
Princeton: ETS ; 1985. 

Assessment of Basic Skills 



Descriptive Tests of Language Skills 

Publ isher: Descriptive Tests of Language Skills, Educational 
Testing Service, Mail Drop 22E, Princeton, NJ 08541; Scales : 
Reading Comprehension, Logical Relationships, Vocabulary, Usage, 
and Sentence Structure; Length : 30-50 items/ test; Time : 15-30 
minutes/ test. 
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The Descriptive Tests of Language skills (DTLS) consist of 
five tests designed for the placement of sti dents in college 
English classes. These tests may be used separately or in 
combination. Because of their low difficulty levels, the DTLS 
are most appropriate for identifying students in need of 
remediation. Research by ETS indicates that all five tests 
evidence acceptable reliability (from .82 to .89); that the DTLS 
are correlated with writing ability and other measures of 
academic ability, such as ACT scores; and that performance on the 
DTLS predicts college grade point average. Studies have not 
examined the appropriateness of the DTLS as instruments for 
evaluating program quality. 

College Entrance Examination Board. Guide to the Use of the 
Descriptive Tests of Langua ge Skills . Princeton; ETS, 1985. 

Snowman, J., Leitner, D.W., Snyder, V. and Lockhart, L. , "A 
Comparison of the Predictive Validities of Selected Academic 
Tests of the American College Test (ACT) Assessment Program and 
the Descriptive Tests of Language skills for College FreshmEn in 
a Basic Skills Program." Educational and Ps ychological 
Measurement , vol. 40, (1980) , pp. 1159-1166. 

Snyder, V. and Elmore, P.B., "The Predictive Validity of the 
Descriptive Tests of Language Skills for Developmental Students 
Over a Four-Year College Program. " Educational and Psychological 
Measurement, vol. 43, (1983), pp. 1113-1122. ' 



Descriptive Tests of Mathematics Skills 

Publisher: Descriptive Tests of Mathematics Skills, Educational 
Testing Service, Mail Drop 22E, Princeton, NJ 08541; scales : 
Arithmetic skills, Elementary Algebra skills, Intermediate 
Algebra Skills, and Functions and Graphs; Langth : 30-35 
items/test; Time ; 30 minutes/test. 

The four tests in the Descriptive Tests of Mathematics 
Skills (DMTS) , used separately or in combination, are designed to 
assess mathematics skills for placement purposes. Because of 
their low item difficulty levels, these tests are not appropriate 
fo3 r, ? lf f erentiating among students with high levels of math 
skills. Research has indicated that the DTMS examinations 
evidence acceptable reliability (.84 to .91) and that the DTMS 
are related to measures of academic ability and performance in 
introductory math courses, particularly remedial courses. 

Bridgeman, B. , "Comparative Validity of the College Board 
Scholastic Aptitude Test — Mathematics and the Descriptive Tests 
of Mathematics Skills for Predicting Performance in College 
Mathematics Courses." Educational and Psychological Measuremen t 
vol. 42, (1982), pp. 361-366. 
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College Entrance Examination Board. Guide to the Use of the 
Descriptive tests of Mathematics Skills . Princeton?" ETs7~I' 



New Jersey College Basic Skills Placement Tests 

Publisher; NJCBSPT, College Entrance Examination Board, 
Educational Testing Service, Princeton, NJ 08541; scales : 
Writing, Reading Comprehension, Sentence Sense, Math Computation 
Elementary Algebra, Composition (composite score), and Total ' 
English (composite score); Length ; 168 items; Time ; 3 hours. 

The New Jersey college Basic Skills Placement Test (NJCBSPT) 
consists of five tests designed to meet the requirements of the 
assessment and evaluation program developed by the New Jersey 
Board of Higher Education. In addition to the five test scores, 
two composite scores can be derived from the language assessment 
parts of the test. Reliability estimates for the seven subscales 
range from .83 to .92. The content validity of the NJCBSPT was 
achieved by providing for constant review during test 
construction by a panel of experts from the New Jersey Basic 
Skills Council, studies on the construct validity and predictive 
validity of the NJCBSPT are currently underway. 

College Entrance Examination Board. The New Jersey Co 11 ego Basic 
s KiUs Placement Test Program : Your Information Base for outcomes 
Assessment . Princeton: ETS, 1987. 

Office of College outcomes. Appendices to the Report of the New 
Jersey Board ox Higher Education from the Advisory committee to 
thje College Outcomes Evaluation Program . Trenton, NJ: New Jersey 
Department of Higher Education, 1987. 



Test of Standard Written English 

Publisher: Test of Standard Written English, College Entrance 
Examination Board, Princeton, NJ 08541; Scales ; Total Score; 
Length: 50 items; Time : 30 minutes. 

The Test of Standard Written English (TSWE) is designed to 
measure a student's ability to use the language contained in most 
college textbooks. Research has found that the TSWE evidences 
acceptable reliability, and is predictive of performance in 
freshman English courses. The TSWE also has been found to be 
predictive of performance during the Junior year. Indeed, the 
TSWE has been found to be as good a predictor of performance as 
longer, more complex exams. 

Bailey, R.L. "The Test of Standard Written English: Another 
Look." Measurement and Evaluation in Guidance , vol 10, (1977), 
pp. 70-74. 
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Michael, W.B. and Shaffer, P. "A Comparison of the Validity of 
the Test of Standard Written English (TSWE) and of the California 
State University and Colleges English Placement Test (CSUC-EPT) 
in the Prediction of Grades in a Basic English Composition Course 
and of Overall Freshman-Year Grade Point Average." Educational 
and Psychological Measurement , vol. 39, (1979) ^ pp. 131-145. 

Suddick, D.E. "A Re-examination of the Use of the Test of 
Standard Written English and Resulting Placement for Older Upper- 
Division and Master's Level Students." Educational and 
Psychological Measurement , vol. 42, (1982), pp. 367-369. 

Suddick, D.E., "The Test of Standard Written English and 
Resulting Placement Patterns: A Follow-up of Performance of 
Older Upper-Division and Master Level Students." Educational and 
Psychological Measurement , vol. 41, (1981), pp 599-601. 



Assessment of Cognitive Development 



Analysis of Argument 

Author : David G. Winter, Department of Psychology, Wesleyan 
University, Middletown, CT 06457; Scales ; Total Score; Length : 
two exercises; Time ; 10 minutes. 

The Analysis of Argument is a production measure designed to 
assess clarity and flexibility of thinking skills. After reading 
a passage representing a particular position on a controversial 
issue, subjects are asked to write a response disagreeing with 
the original position. After 5 minutes, they are then instructed 
to write a short essay that agrees with the original position. 
The two essays are scored using a 10-category scheme. Because 
inter-rater agreement is a function of training, the authors do 
not provide estimates of reliability. The authors do report that 
studies have found that scores on the Analysis of Argument test 
are significantly related to other measures of cognitive 
development, as well as to previous educational experiences. 

Stewart, A.J. and Winter, D.G. Analysis of Argument : An 
Empirically Derived Measure of Intellectual Flexibility . Boston: 
McBer and Company, 1977. 



Erwin Scale of Intellectual Development 

Author : T. Dary Erwin, Office of Student Assessment, James 
Madison University, Harrisonburg, VA 22801; Scales : Dualism, 
Relativism, Commitment, Empathy; Length: 86 items; Time : untimed. 
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The Erwin scale of Intellectual Development (SID) was 
designed to measure intellectual development based on Perry's 
scheme, three of the four sub-scales (dualism, relativism and 
commitment) paralleling Perry's categories of intellectual 
development. Research on the SID has found that all four sub- 
scales evidence acceptable reliability (.70 to .81) and that the 
SID is significantly related to other measures of development, 
including measures of identity and involvement. 

Erwin, T.D., "The Scale of Intellectual Development: Measuring 
Perry's Scheme." Journal of College student Personnel . vol. 24, 
(1983), pp. 6-12. 

Perry, W.G. , Jr. Eorms of Intellectual and Ethical Development in 
the college .Years. New York: Holt, Rinehart and Winston, 1970. 



Measure of Epistemological Reflection 

fluthor: Margaret Baxter-Magolda, Department of Educational 
Leadership, Miami University, Miami, OH; Scales : Total Score; 
Length: 6 stimuli; Time : untimed. 

The Measure of Epistemological Reflection (MER) represents a 
bridge between recognition and production measures, six stimuli 
corresponding to Perry's levels of development are presented to 
subjects, who are then asked to justify the reasoning used in 
each stimulus, Standardized scoring procedures provide a 
"quantified measure of intellectual development. Alpha 
reliability for the ratings may be as high as .76, while 
interrater reliability has ranged from .67 to .80, depending on 
the amount of training provided to raters. Research has provided 
support for the developmental underpinnings of the MER, revealing 
significant score differences for different educational levels. 

Baxter-Magolda, M. and Porterfield, W.D. "A New Approach to 
Assess Intellectual Development on the Perry Scheme." Journal, of 
College Student Personnel , vol. 26, (1985), pp. 343-351. 



Reflective Judgment interview 

Authors: K.S. Kitchener, School of Education, University of 
Denver, Denver, CO, and P.M. King, Department of College student 
Personnel, Bowling Green state University, Bowling Green, OH; 
Sca3.es: Total score; Length : four dilemmas; Time : approximately 
40 minutes. 

Like the MER, the Reflective Judgment interview (RJI) 
represents a bridge between recognition and production measures. 
It consists of four dilemmas which are presented individually to 
the subject. Each dilemma is followed by a series of 
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standardised questions designed to identify which of Perry's 
seven stages o* intellectual development is being used by the 
subject to deal with that dilemma. A subject's score is the 
average rating across dilemmas and across raters. Research has 
shown that the RJI evidences acceptable levels of reliability 
(.7i to .78). In addition, the RJI has been found to be 
significantly related to other measures of critical thinkinq as 
well as to levels of education. 

Brabeck, M.M. "Critical Thinking skills and Reflective Judge- 
ment Development: Redefining the Aims of Higher Education." 
JginaajL af ABpjjjsfl pevelo pmental Psychology , vol. 4, (1983), 
Pp • 23^34 • 

King, P.M. and Kitchener, K.S. "Reflective Judgment Theory and 
Research: Insights into the Process of Knowing in the College 
Years." Paper presented at the annual meeting of the American 
College Personnel Association, Boston, 1985. [ED#263-821] 

SISS?**!-*' 8 ' ? ing ' P * M * " Re flective Judgment: Concepts of 
Justification and Their Relationship to Age and Education. « 
*|H|S|1 of Applied Developmental Psychology, vol. 2, (1981), pp. 



Test of Thematic Analysis 

^£hoE: pavid G. Winter, Department of Psychology, Wesleyan 
University, Middletown, CT 06457; Scales: Total Score (optional- 
differentiation, discrimination, intimation) ; Lsmsh: one 
exercise; T ^m e ; approximately 30 minutes. 

*™ l h f TSSt of Them ?tic Analysis uses a compare and contrast 
format to assess critical thinking skills. Subjects are 
presented with two sets of data and are asked to describe (in 
writing) how the two sets differ. The content of the essays is 
2 V2 a ^-point scale, in addition, scales derived from 
olS^ 4. r ^ atlon Processing research can be used to evaluate the 
?n^S f 6 ° f tne responses, studies have found high levels of 
£™ I 6 5 agreement when scoring the TTA. Test scores also have 
been found to be significantly correlated with academic abilit? 
c£L^ rS - W ?? k * In additi °n, measures of the structural 
-ftf?? eristics of students' essays have been found to be 

lY related to other measures of critical thinking, as 
well as to previous educational experiences. 9 

??ocefsiAg H *!5;^ S r ir r ^ M ;;* and st reufert, S. Human information 
Processing . New York: Holt, Rinehart and Winston, 19677 

C^itig;i D Thin king at p C ^ Empirically Derived Measure of 
critical Thinking . Boston: McBer and Company, 1967. 
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Winter, D.G. and McClelland, D.C. "Thematic Analysis: An 
Empirically Derived Measure of the Effects of Liberal Arts 
Education." Journal of Educational Psychology , vol. 70, (1978) 
pp. 8-16. 

Winter, D.G., McClelland, D.C. and Stewart, A.J. A N_ew Case for 
the Liberal Arts . San Francisco: Jossey-Bass, 1981. 



Watson-Glaser Critical Thinking Appraisal 

Publisher : G. Watson and E.M. Glaser, Harcourt, Brace, and World; 
New York, NY; S cales : Total Score, inference Recognition of 
Assumptions, Deduction, Interpretation, and Evaluation of 
Arguments; Length : 100 items; Time: 50 minutes. 

The Watson-Glaser Critical Thinking Appraisal (CTA) is a 
multiple-choice measure designed to assess students' critical 
thinking abilities. In addition to a total score, five sub-scores 
can be derived from the CTA. Research has found that the total 
score on the CTA evidences acceptable reliability (.85 to .87) 
over seven norm groups and that students' performance on the CTA 
is positively related to their college experiences. In addition, 
the CTA has been found to be predictive of performance in courses 
emphasizing critical thinking. 

Crites, J.O. "Test Review." Journal of Counseling Psychology f 
vol. 12, (1965), pp. 328-330. 

Helmstadter, G.C. "Watson-Glaser Critical Thinking Appraisal." 
Journal of Educational Measurement r vol. 2, (1965), pp. 254-256. 

Westbrook, B.W. and Sellers, J.R. "Critical Thinking, 
Intelligence, and Vocabulary." Education al and Psychological 
Measurement , vol. 27, (1967), pp. 443-446. 

Wilson, D.G. and Wagner, E.E. "The Watson-Glaser Critical 
Thinking Appraisal as a Predictor of Performance in a Critical 
Thinking Course." Educational and Psychological Measurement P 
vol. 41, (1981), pp. 1319-1322. 



Assessment of Values 



Defining Issues Test 

Author : James R. Rest, Department of Social, Psychological and 
Philosophical Foundations of Education, 330 Burton Hall, 
University of Minnesota, Minneapolis, MN 55455; Scales : "p" 
score; Length : 72 items; Time : untimed. 
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Rest developed the Defining Issues Test (DIT) , a recognition 
measure of moral reasoning, based on the six stages identified by 
Kohlberg. Research has indicated that alpha reliability for the 
DIT is .77 and test-retest reliability is approximately .80. 
Research also has indicated that the DIT is significantly 
correlated with other measures of moral development, specifically 
Kohlberg' s measure, and longitudinal research has found evidence 
of progression from lower-ordered tc principled reasoning. 
Results also indicate that the DIT produces higher scores for 
principled reasoning than does Kohlberg 's measure, and these 
higher scores are not due to upward faking on the DIT. These 
results suggest that production and recognition measures provide 
significantly different views of moral reasoning. 

Biggs, J. A. and Barnett, R. "Moral Judgunent Development of 
College Students." Research in Higher Education , vol. 14, (1981), 
pp. 91-102. 

Davison, M.L. and Robbins, S. "The Reliability and Validity of 
Objective indices of Moral Development." Applied Psychological 
Measurement, vol. 2, (1978), pp. 391-403. 

McGeorge, C. "Susceptibility to Faking the Defining Issues Test 
of Moral Development." Developmental Psychology , vol. 11, (1975), 
p. 108. 

Rest, J.R. "Longitudinal Study of the Defining Issues Test of 
Moral Judgement: A Strategy for Analyzing Developmental Change." 
Developmental Psychology r vol. 11, (1975), pp. 738-748. 

Rest, J.R. Development in Judging Moral Issue s. Minneeipolis: 
University of Minnesota Press, 1979. 

Rest, J.R. , cooper, D. , Coder, R. , Massanz, J. and Anderson, D. 
"Judging the Important Issues in Moral Dilemmas — An Objective 
Measure of Development." Developmental Psychology r vol. 10, 
(1974) , pp. 491-501. 



Humanitarian/Civic Involvement Values 

Author: Ernest T« Pascarella, College of Education, University of 
Illinois at Chicago, Box 4348, Chicago, IL 60680; Scales : Total 
Score; Length : 6 items; Time ; untimed. 

The measure was derived from questions on the survey 
designed by the Cooperative Institutional Research Program 
(CIRP) . Alpha reliability for this scale has been estimated to 
be .77. Results of research using this scale indicate that 
collegiate academic and social experiences are significantly 
related to the development of humanitarian/civic-invoivement 
values, and that social involvement has the greater impact. 



324 



*>ascarella, E.T., Ethington, c.A. and Smart, J.C. "The Influence 
of College on Humanitarian/civic-Involvement Values." Paper 
presented at the annual meeting of the American Edcational 
Research Association, Washington, D.C., 1987. 



Kohlberg's Measure of Moral Development 

Author: Lawrence Kohlberg, "The Development of Modes of Moral 
Thinking and Choice in the Years Ten to Sixteen." Unpublished 
Doctoral Dissertation, University of Chicago, 1958; Scales : Total 
Score; Length : three dilemmas; Time : untimed. 

In an effort to assess moral reasoning, Kohlberg developed a 
production measure that presents subjects with three moral 
dilemmas and requires them to explain how the dilemmas should be 
resolved. Subjects' responses are scored by raters trained to 
identify the dominant stage of moral reasoning employed. 
Reliability estimates for this technique are well within accented 
limits (above .90). Research has provided support for the 
construct validity of Kohlberg's approach, identifying a clear 
step-by-step progression through the stages of moral reasoning. 
Moral reasoning also has been linked to students' previous 
educational experiences. 

Kohlberg, L. TM Psychology p_£ Moral Development . New York: 
Harper and Row, 19 84-. ■ • 



Rokeach Value survey 

Publisher: Halgren Tests, The Free Press, New York, NY; scales : 
Instrumental Values, Terminal Values; Length : 36 items; untimed. 

The Rokeach Value Survey was designed as a means of 
describing subjects' value systems. Respondents are asked to 
rank two sets of values (instrumental and terminal) . Multiple 
administrations of the instrument can be used to measure 
stability and change in value systems. Test-retest reliability 
has been estimated to be adequate (.65 to .74). Moreover, 
research has shown "that changes in individuals' value systems can 
be linked to life events. 

Rokeach, M. TJie Nature af Human Values . New York: The Free 
Press, 1973. 

Rokeach, M. (ed.) Understanding Huroa ) Values : Individual and 
Societal . New York: The Free Press, 1979. 
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TABLE 1: ORIGINATING STATES FOR REQUESTS 



REQUEST FOR REQUEST FOR TOTAL 

STATE PUBLICATIONS GENERAL INFO REQUESTS 



Alaska 




1 


X 


Arkansas 


2 


A 

4 


6 


Alabama 


D 


33 


O Q 


Arizona 


A 

4 


13 


1 1 
1 / 


California 




/ Z 


/ D 


uoiorauo 






O O 


Connecticut: 




14 


14 


Delaware 


1 


4 




T XOxXCla 


O 


1 A 




/""• J"S V» ^ «■» 

Caeojcgia 


o 

J 


J J 


O O 


nawan 




Id 


ID 




O 


o *< 


AO 


Indiana 


IX 


J X 


A O 


lUaiiU 


1 


A 
** 




Iowa 




XD 


1 / 


j\en uucjcy 


Q 

o 


J** 


A O 




A 
*t 


11 

X X 


1 5 

X +J 


.Louisiana 






J O 


Ma v^irl a ti 
rial, y Xdiiu 


o 


xo 


^ \j 


nassacnuseuus 


J 


/ 


1U 


rixssoux x 


1U 


A*7 


D / 


TUT "i ^*«V* "i /ta 

rixcnxgan 


c 
D 


06 
c* O 


O X 


riinneso ta 


O 


OA 


o u 


nibbibbippi 




1 ft 
XO 


90 


rid xi its 


1 

X 


5 


6 

w 


MaKva c? V*a 


1 


o 

J 


A 


IiCW XlCllll£SOliXX w 




5 


5 




X 


1 0 


xo 


Mam .ToygOV 

ri6W UciScy 


A 
*x 


ox 




Mam VnyV 


Q 
O 


AA 




nOi til ^aiUllila 


O 


A7 




NOiCn LlaKOta 








iicv aua 




3 


3 


UJvXanulua 




A 


A 


ureyon 


X 


7 


Q 
O 




6 


26 


32 


rciinsyivania 




AA 
**** 


*t o 


Rhode Island 


i 


14 


15 


South Carolina 


6 


26 


32 


South Dakota 




3 


3 


Tennessee 


8 


73 


81 


Texas 


5 


51 


56 


Virginia 


6 


67 


73 


Vermont 


1 


4 


5 


Utah 


1 


2 


3 


Washington 


1 


9 


10 


Wisconsin 


1 


33 


34 


West Virginia 


1 


5 


6 
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Wyoming l 12 

District of Columbia 3 52 55 

Puerto Rico 2 9 n 

Canada 5 5 

Australia 2 2 

Israel 1 1 

Germany 1 1 

Netherlands 2 2 

Unknown 34 34 



Total 158 1245 1403 



1 
1 
1 
1 
9 
1 
1 
1 
1 
1 
1 

a 
1 



TABLE 2: REQUESTS BY MONTH RECEIVED 



REQUEST FOR REQUEST FOR TOTAL 

MONTH PUBLICATIONS GENERAL INFO REQUESTS 



October, 1986 




10 


10 


November, 1986 




4 


4 


December, 1986 




5 


5 


January, 1987 




13 


13 


February, 1987 




22 


22 


March, 1987 




29 


29 


April, 1987 




17 


17 


May, 1987 




57 


57 


June, 1987 




13 


13 


July, 1987 




20 


20 


August, 1987 




33 


33 


September, 1987 




16 


16 


October, 1987 




15 


15 


November, 1987 




37 


37 


December, 1987 




55 


55 


January, 1988 




25 


25 


February, 1988 




18 


18 


March, 1988 




25 


25 


April, 1988 




27 


27 


May, 1988 




25 


25 


June, 1988 




29 


29 


July, 1988 


**> 


23 


26 


August, 1988 


12 


41 


53 


September, 1988 


21 


37 


58 


October, 1988 


20 


57 


77 


November, 1988 


11 


30 


41 


December, 1988 


10 


11 


21 


Januarv . 19 8 9 


1 0 


X 


1 1 

X JL 


February, 1989 


7 




7 


March, 1989 


4 


8 


12 


April, 1989 


4 


12 


16 


May, 1989 


12 


5 


17 


June, 1989 


8 


13 


21 


July, 1989 


21 


13 


34 


August, 1989 


5 


7 


12 


September, 1989 


10 


3 


13 


Not Dated 




487 


487 


Total 


158 


1243 


1401 



122 

9 
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TABLE 3: REQUESTS BY INSTITUTION 





< DT?/"\TTT?Cm TPfN'D 


REQUEST 


FOR 


TOTAL 


INSTTTTTTTOTJ 


irUDJ_ilU/il XUJNo 


GENERAL 


INFO 


REQUESTS 


universities 


82 


561 




643 


Colleges 


46 


263 




309 


Community Colleges 
Vol/Tech School q 


23 


119 




142 


o 






13 


Professional Schools 


i 


14 




15 


Government Agency 


i 


57 




58 


Private Concerns 


i 


17 




18 


Others 




80 




80 


Unknown 


i 


122 




123 


Total 


158 


. 1243 




1401 
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TABLE 4: QUALITY OF ARC PUBLICATIONS & SERVICES 



PUBLICATION OR cjFRVTPF 


QUALITY 
RATING 


NUMdEiK of 
Kbb irONDENTS 


Telephone Pnnvprc:a +■ ■? rm 


4 


.45 


31 


Assessment Bibliography 


4 


.49 


55 


Articles on Assessment at UTK 


4. 


.06 


54 


Satisfaction Surveys used at UTK 


3. 


.92 


39 


Bibliography of Assessment Instruments 


4. 


.27 


48 


ARC Workshop Materials 


4. 


14 


35 


Material on Locally Developed Tests 


3. 


48 


25 


Research Papers on Assessment 


4. 


00 


34 


Conference Papers or Presentations 


4. 


00 


43 


Average Quality Rating =4.13 









Scale was a 5-point Likert Scale ranging from 1 = poor quality 
to 5 = excellent quality. 
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TABLE 5: RESULTS OF THE ARC QUALITY SURVEY 



1. The material answered my questions 3.90 

2. Center staff were helpful and courteous 4.32 

3. Material was clearly written 4.03 

4. Copies received were complete & readable 4.04 

5. The time it took to get the material was reasonable 3.97 

6. The cost of the material was reasonable 4.20 

7. Information sent was helpful in implementing program 3.76 

8. Overall, the service provided was useful in meeting 4.03 

our needs. 



Average item rating =4.03 

Scale was a 5-point Likert Scale ranging from 1 = strongly 
disagree to 5 = strongly agree. 
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TABLE 6: WORKSHOP EVALUATIONS 



QUESTION 


MARCH 
1987 


NOVEMBER 
1987 


NOVEMBER 
1988 


In comparison with other similar 
workshops you have attended, how 
would you rate the overall quality of 
the assessment workshop? 


7.7 


7.7 


7.0 


How would you rate the overall 
effectiveness of the workshop in 
addressing the assessment 
needs/questions of your institution? 


7.5 


7.3 


6.6 


Location of the workshop 


7.9 


6.6 


7.3 


Meeting facilities 


7.7 


7.0 


7.5 


Information received prior to the 
workshop 


6.6 


5.3 


6.5 


Registration assistance 


8.7 


8.2 


8.5 


Organization of the program 




7.7 


7.5 


Opportunities to learn from others 




7.7 


7.2 


Workshop materials 


7.7 


8.0 


8.0 


Length of the workshop 


7.9 


7.8 


7.5 


Workshop size 




8.1 


7.0 


Quality of the Consultant Panel 
Presentations 


7.4 


7.6 


7.0 


Effectiveness of the Panel 


7.4 


7.3 


6.6 


Content of the concurrent sessions 


8.1 


7.8 


7.2 


Presentation of the content in the 
concurrent sessions 


8.1 


8.4 


7.3 


Effectiveness of the concurrent 
sessions 




7.1 


6.6 


Interactions with similar 
institutions 


6.1 


6.5 


6.7 


Interactions with consultants 


7.3 


7.3 


7.2 
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TABLE 7 : Workshop Quality 
August 1987 



QUESTION MEANS 

1. A* In comparison with other similar workshops vou.jhave attended, hew 7.30 
would you rate the overall quality of this workshop? 

B. How would you rate the overall effectiveness of this workshop in 6.71 
addressing the assessment needs/questions of your institutions? 

C. Please rate each of the following: 



i 

i 

i 

i 

i 

i 

i 

i 

i 

i 

i 

i 
i 

" D. Comments on general workshop quality and adequacy in meeting your needs: 

S Participants generally rated tie quality and relevance of the workshop 
as good. The information provided was perceived as thorough and well 
documented. Suggestions included smaller workshops of similar size 

{colleges with more interaction (question/answer format). Additional 
requests included more information on resource allocation, cognitive 
style, assessment of non-cognitive areas, and a notebook containing 
outlines of all presentations. 



1. 


Location, of the workshop (Knoxville) 


8.10 


2. 


Meeting facilities 


8.24 


3. 


Information received prior to the workshop 


7.66 


4. 


Registration assistance 


8.98 


5. 


Organization of the program 


7.61 


6. 


Opportunities to learn from others (at UIK and other institutions) 


7.76 


7. 


Workshop materials 


8.05 


8. 


Length of the workshop 


8.24 


0 

^ • 


Workshop size 


6.20 



I 
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EFFECTIVENESS OF THE PANEL 



QUESTION MEANS 

2. A. How would you rate the effectiveness of the discussion sessions in 6.72 
addressing' the assessment needs/questions of your institution? 

B. Using the 10-point scale, how vrould you rate each of the individual 
presentations? 

1. "Using Outcomes Information to Assess the Accomplishment of Institu- 7.90 
tional Mission and Goals" (Homer S. Fisher & Trudy W. Banta) 

2. "System Environment Success Factors" 6.31 
(F. Ramsey Valentine) 

3. "Using Existing Campus Information" 6.10 
(John T. Henmeter) 

4. "Gathering Information Through Surveys'* 8.13 
(William Lyons) 

5. "Gathering and Using Test Data" 6.44 
(Edward L. fedford) 

6. "Work of the Assessment Resource Center" 6.71 
(Gary R. Pike) 

- 7. "Technical Considerations in Using Surveys" 7.20 
(Henmeter, Lyons, & Scroggins) 

8. 'Technical Considerations in Using Test Data" 6.16 
(Edward L. tedford) 

9. "ttbtiLvating Participants and Reporting Results" 8.08 
(Trudy W. Banta) 

10. "Research Using Assessment Data" 7.80 
(Gary R. Pike) 
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April 15, 1988 



\Bethanv 

g Lutheran CMiege ®r 

■ Office of the Registrar 

I 

I 

1 

I Trudy Banta 
University of Tennessee-Knoxville 
2046 Terrace Avenue 
Knoxville, Tennessee 37996-3504 

' | Dear Trudy: 

II want to take this opportunity to thank you and Gary for all your efforts 
in coordinating and sponsoring our recent GOMP workshop in Kansas City* 
The conversations with the ACT folks and within the entire group were 

I interesting and for me very educational* Many questions about COMP were 
raised which I had never really confronted. 

I don f t know if we will be able to meet together as a group, but I hope 
g we will be able to continue to communicate in the future. Thanks again! 

Sincerely, 



Ronald J- Yiftinge " 
Regis tr« c 



RJY/dg 
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Bemidji State University bemidji, Minnesota 56601 



HONORS COUNCIL 
218-755-3984 
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CROSS-NATIONAL ASSESSMENT 
OF HIGHER EDUCATION 
John Harris 
Samford University 
October, 1989 

Primary Audiences 

This report is intended for two audiences: 

1 . Officials in the U.S. Department of Education (USDE) who may 
use it to develop a Request-for-Proposals. 

2. The Study Group on Evaluation of Higher Education sponsored 
by the Programme on Institutional Management in Higher 
Education (IMHE) of the Organization for Economic Cooperation 
and Development (OECD), Paris. 

U.S. Questions 

There appear to be three cross-national assessment questions that 
interest Federal officials involved in policies dealing with American higher 
education. 

1. How does the quality of American higher education compare 
with that of other technological-industrial nations i.e. Australia, 
Canada, England, France, Japan, Russia, Sweden, West Germany? 

2. Are there advantages for U.S. colleges and universities as well 
as the U.S. economy in some type of linkage with the European 
Community Course Credit Transfer System of the EC's Erasmus 
Bureau? If there are, what type of linkage could be 
established? 

3. What are the free trade and balance of payments implications 
for freer cross-national recognition of professional credentials? 
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If there are compelling advantages, what are the feasible first 
steps toward increased cross national recognition? 
Background Documents 

With this report, the author completes his exploration of the 
possibility of a cross-national comparison of higher education. This 
exploration began in 1988 with the author's appointment to the five 
person Study Group on Evaluation of Higher Education sponsored by the 
Program on Institutional Management of Higher Education of the OECD. 

The author has explored the feasibility of a direct comparison of the 
academic achievement of American and European students through testing. 
The interest in the feasibility of such a comparison originated with the 
Director of the Fund for Improvement of Postsecondary Education (FISPE). 

The author's exploration of this possibility is reflected in the 
following documents: 

1. "Cross National Assessment of Student Achievement: Issues 
and Opportunities To Be Explored," First Draft: January 16, 
1989; Second Draft: January 31, 1989. 

2. "Addendum," (This "Addendum" to the prospectus identified in 
#1 above was drafted in Paris in the week of January 23, 1989. 
It describes 12 papers (not counting the first one— Rationale 
and Plan) to be written by various authors on different aspects 
of a cross-national comparison of student achievement. 
Authors were identified for nine of the twelve papers and eight 
were eventually written and submitted to the Study Group. 

3. Eight Mini-Papers 

(1) Alan Wagner (CERI-OECD), "The Context for a Cross- 
National Assessment of Student Achievement." 
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(2) Urban Dahllcf, Professor of Education (Uppsala University, 
Sweden), "The Feasibility Study of Cross-National 
Assessment of Student Achievement in Higher Education." 
(Information needed to determine causes for deficiencies 
signalled by indicators as part of a strategy for 
improvements). 

(3) Op cit. "Special Methodology Mini-Paper on Secondary 
School Preparation." 

(4) Fritz Dalichow (Erasmus Bureau-Brussels), "Method of 
comparing higher education degree programs in terms of 
program length or duration and subject controls." 

(5) Trudy Banta (University of Tennessee), "Possibilities For 
Cross-National Comparisons." (Method of comparing 
content and levels of expectation on examinations used at 
the time of degree completion whether for assessment of 
individual student achievement or for program 
evaluation in different degree programs, i.e. chemistry, 
history, etc.) 

(6) Charlotte Kuh (Educational Testing Service), "On the Use 
of the Graduate Record Examinations as an Indicator of 
Cross-National Achievement." (Exploration of how the 
GRE Subject Examinations or other similar examinations 
could be used to provide internal comparisons of student 
achievement in under-graduate degree programs, i.e. 
business administration, chemistry, history, elementary 
education, etc.) 
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(7) Richard L. Ferguson (American College Testing), "Using 
Certification And Licensure Examinations To Compare 
Cross National Levels of Performance In The Professions." 
(Method of comparing content and pass levels of 
examinations that admit candidates to professional 
practice in such fields as, medicine, mechanical 
engineering, accounting (sub-field of business 
administration), and elementary or secondary school 
teaching.) 

(8) Roeland in % t Veld (Erasmus University c Netherlands), 
"Selection by Multi-National Corporations." (Method of 
investigating if and how multi-national corporations 
observe and possibly use differences in degree program 
outputs in recruiting and hiring executives or technical 
experts in such fields as business administration, 
chemistry, and mechanical engineering.) 

Memo to Mini-Paper Authors-"Status of Cross-National 
Assessment Possibility." This memo summarizes the thinking 
of the Study Group and of the author on how higher education 
may be compared cross-nationally. It does not deal with the 
economic-labor issue involved in cross national recognition of 
professional credentials and transfer of academic credit. The 
economic-labor issue arose after this June, 1989, memo was 
drafted. 
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Objective 



1 

i 

5. "Significant U.S.A. Assessment Documents" is an annotated 
| bibliography prepared by the author at the request of the 

Study Group. The author provided copies of the publications 
1 included in this bibliography to the IMHE staff at the second 

j| meeting of the Study Group in 1988 in Paris. 

J 

The objective of this proposed cross-national assessment of higher 
| education is to provide American higher education policy makers 

I information on: 
1. How American higher education compares with that of other 
| industrialized nations in terms of 1) secondary school 

preparation, 2) selectivity, 3) curricular content, 4) expected 
| student achievements as evidenced by examination procedures, 

and 5) degree completion rates. 
2. The opportunities that the development and implementation of 
| tne European Community's policies and procedures for cross- 

national transfer of academic credit and the cross-national 
| recognition of professional credentials may present for boosting 

educational exports and free trade in professional and business 
services. 
Focusing Con sideration^ 

A general study plan is proposed later in this report. It is predicated 
| on the following "focusing considerations": 

Preparation, and Process. The original interest in cross national 
comparison was on students' tested performance. As indicated in the 
"background documents," international testing of university students does 
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David Lipscomb University 



January 6, 1989 



Dr. Trudy Banta 

Learning Research Center 

The University of Tennessee, Knoxville 

Knoxville, TN 37996-4350 
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Dear Dr. Banta: 

The discussion sessions you have convened on assessment have 
been T .ery helpful to me. As you know, I have been professionally 
engaged' in assessment theory and work for some time. The general 
conferences that are held are of little value to me professionally. 
In contrast, your sessions allow those of us who are deeply 
involved to explore theoretical and implementation issues that help 
us advance the state of the art. It is important, therefore, that 
your Center receive additional funds for these activities. 

My personal view is that we need considerable discussion on 
ho.w assessment data can be collected, arrayed, and used for 
continuous improvement of educational practice. Therefore, I would 
favor sessions with experts, perhaps from business or industry, who 
use numbers for the improvement of quality of services and 
products. Higher educators can learn a great deal from them. 




Sincerely, 



JH/bd 
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Nashville, Tenncwc* 17204- WI 



TN WATS M2-2 WO • US WATS 1W251-2054 




r% Alverno College 



3401 S. 39th Street 
Milwaukee. Wl 
53215-4020 
(414) 382-6000 



OFFICE OF RESEARCH AND EVALUATION 



January 12 , 1989 



Dr. Trudy Bant a 

Learning Research Center 

1819 Andy Holt Avenue 

The University of Tennessee-Knaxville 

Knoxville, Tennessee 37996-4350 



Dear Dr. Bant a, 

This letter is in support of your efforts to obtain FIPSE funds for some 
additional dissemination activities for your third year project. The goal for 
these funds is to support some continuing seminars for those of us working to 
resolve the emerging measurement problems in higher education assessment. We 
understand that you are requesting funds for seminars similar to those you 
planned for 1988-89 on assessment issues. 

We are very much in support of your efforts to organize some additional meetings 
of a small but experienced group who are actively engaged in creating assessment 
practices. We are particularly interested in those practices related to 
establishing the validity of assessment processes and instruments used by 
institutions to validate student outcomes of college—and to establish the 
validity of curricula. 

In our view, both assessment processes and instruments designed by faculty to 
assess and credential individual student learning, and those designed by an 
institutions assessment specialists to establish the validity of curricula and 
programs are in need of careful validation. These locally-designed instruments 
are critical to the growing assessment movement, since many institutions and 
their faculties find that off-the-shelf measures created by testing companies 
are not useful for the more individualized purposes of institutions. 

We believe tfcfc the validity of an assessment process can only be determined in 
light of educational purposes and principles, and that therefore we need new 
ways to think about validity, ways that begin with and are driven by educational 
purposes, not by the traditional purposes of measurement alone. In the current 
educational reform movement, educational purposes and principles are being 
re-examined, as are approaches to assessment. 

It is not surprising that those of us in the thick of this re-examination may 
need some particular strategies to help U3 confront the measurement issues that 
are emerging. We need an atmosphere of support where new ideas can be tried out 
and problems can be identified and solved. 

It is the consensus of our seminar group that reviewing these issues together 
will ultimately lead to problem identification—which is one of the main 
outcomes of our meetings so far. As a group, we are particularly interested in 
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continuing these seminars because we find that assessment practice is currently 
being created to resolve various problems, often out of the experience of those 
faculty and other assessment specialists who are dealing with quite a different 
set of problems than has been faced previously by the measurement and evaluation 
professional community. Once we can identify problems in these seminars as a 
group, we can identify those individuals who might consult with us to resolve 
these problems. We will also find out which problems are currently wide-open, 
where there are no consultants who can help, and where we will have to find 
solutions out of our combined experience and own experimentation. 

For example, each participating group member has independently come to the 
realization that finding ways to validate instruments that use open ended 
responses leads us to consider issues that our measurement consultants agree lie 
in uncharted territory. It is unlikely that we would have come to this insight 
had we. not had the opportunity to formulate our thinking as a group— to examine 
what issues we had in common, and then to jointly pursue some means to solving 
our problems. In addition, we are identifying for ourselves what the current 
barriers to solutions are across institutions so that we can decide which 
problems and their solutions are most critical, and where resolution is most 
likely to benefit the broadest group of institutions. 

The group process has been enormously valuable. It gives each of us confidence 
that we are not alone. The process demands clear thinking from each of us, and 
forces us to find the nub of the problems we are confronting. The opportunity 
to clarify in dialogue with others who have some understanding of the issues 
involved has been helpful because they are engaged in the same enterprise on a 
day to day basis. The amount of time we have created to be together takes us 
beyond simplistic definitions or the complaint stage of problem finding. 
Because we have different institutional contexts, we have a chance to identify 
commonalities in experience, as well as to share strategies so that each does 
not have to reinvent the measurement wheel. 

Frankly, the meetings with consultants have been most helpful for us at Alverno, 
because the meetings have forced us to clarify our own positions on why and 
where current measurement theory has not been able to provide textbook answers 
to validating instruments. To say that we need new measurement approaches to 
establishing the validity of instruments (e.g., where the student performance 
rather than the item is the unit of analysis; where an open ended/ complex, 
interactive, sustained, dynamic response is the type of response rather than a 
recognition response; where responses are performance based) has meant that we 
have had to clarify the nature of the problems we are finding. To say that 
qualitative validation strategies must be applied and that we have to study what 
it means to establish the validity of complex expert judgment (e.g., where the 
assessment situation can change per student; where the criteria applied can 
change per student) means reconsideration of just how we present the problem to 
the measurement community. 

We emphasize how important it is to meet together, to know and trust one 
another, so we are willing to admit to our most difficult problems. Often these 
are problems that the professional measurement community claims to have solved. 
For example, there is widespread agreement that establishing the validity of 
expert judgment means, in part, establishing high inter judge agreement. It 
takes a supportive atmosphere and the support of colleagues who can imagine the 
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problem to help us say publicly "no, not in every case" and then to respond to 
the polite "here*s the text from Measurement 101; just add up times they agree 
and get a percent agreement" with the challenge "that doesn't necessarily answer 
the question and I am not sure why." Such dialogue has helped us sort out 
problems where answers exist from those that have no answers. Engaging 
•measurement specialists in the search for new solutions becomes an essential 
part of our seminar strategy. 

Now for the practical side. Many of us in this group of seminar participants 
are from institutions who are the most experienced in assessment. We find it 
difficult to meet at those professional meetings we jointly attend. We are 
usually presenting at these meetings and using outside session time to meet with 
people who are pressing us for what we already know. Many of us are making 
presentations on assessment at many professional meetings and cannot stay for a 
whole meeting. Our schedules conflict .and allow only a single lunch or dinner 
meeting. Further, our institutional support goes for travel to professional 
meetings , and phone, computer and mail costs to maintain relationships with each 
other after our "seminars." We cannot afford additional travel. 

In sum, the issues are there, the problems are critical, and the support is 
essential. We are in strong support, Dr. Banta, of your request to acquire 
FIPSE funds to help us all continue the work we have really just begun. We are 
prepared to contribute the time away from our other pressing concerns to begin 
to deal seriously with the measurement issues confronting higher education 
assessment that can benefit a range of institutions. We hope that FIPSE can 
provide some financial support. 




Marcia Mentkowski, PhD 
Director 

Office of Research and Evaluation 
Professor of Psychology 



Glen Rogers, PhD 

Research Associate 

Office of Research and Evaluation 
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Dr. Trudy Bant a 
Learning Research Center 
University of Tennessee 
1819 Andy Holt Avenue 
Knoxville, TN 37996-4350 

Dear Trudy, 

I am writing for support of the University of Tennessee under 
FIPSE sponsorship to extent the assessment director seminars. 
These seminars bring together the most active people in 
assessment from colleges and universities around the country. 
From my perspective, these seminars serve several purposes: 

1. To bring together professionals from active programs to 
meet, share ideas, and build informal support networks, 

2. To allow for in-depth discussion of some advanced 
topics usually stimulated by an outside consultant, 

3. To allow for focused discussions on future directions 
for the assessment movement in higher education. 

In addition, most of us in these seminars feel a responsibility 
to work with other institutions which are relatively new to 
assessment. For example, my institution was chosen by our state 
to serve as a model for other Virginia institutions. The ideas 
from professionals who are conducting more sophisticated research 
and practice also filters out to a broader audience beyond those 
who attend the seminars. 

At my institution, no money is allocated for these types of 
seminars. The last seminar which I attended at Princeton was 
graciously funded by The University of Tennessee using FIPSE 
monies. I will not be able to attend the Milwaukee seminar due 
to lack of funds. 
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I hope these seminars are able to continue and appreciate the 
role FIPSE has played in funding support for assessment around 
the country. 

Sincerely, 



T. DarwErwin 
Director 
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